Homework 3
sai Pothula
Author

Sai Padma pothula

Published

May 2, 2023

Code
library(alr4)
Loading required package: car
Loading required package: carData
Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.
Code
library(smss)
library(ggplot2)
library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
Code
data(UN11)
Code
str(UN11)
'data.frame':   199 obs. of  6 variables:
 $ region   : Factor w/ 8 levels "Africa","Asia",..: 2 4 1 1 3 5 2 3 8 4 ...
 $ group    : Factor w/ 3 levels "oecd","other",..: 2 2 3 3 2 2 2 2 1 1 ...
 $ fertility: num  5.97 1.52 2.14 5.13 2 ...
 $ ppgdp    : num  499 3677 4473 4322 13750 ...
 $ lifeExpF : num  49.5 80.4 75 53.2 81.1 ...
 $ pctUrban : num  23 53 67 59 100 93 64 47 89 68 ...
 - attr(*, "na.action")= 'omit' Named int [1:34] 4 5 8 28 41 67 68 72 79 83 ...
  ..- attr(*, "names")= chr [1:34] "Am Samoa" "Andorra" "Antigua and Barbuda" "Br Virigin Is" ...

A: Predictor: gross national product per person (ppgdp) Response: fertility

Code
# Assuming your data frame is named 'df'
ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
  geom_point()

Code
# Assuming your data frame is named 'df'
ggplot(data = UN11, aes(x = log(ppgdp), y = log(fertility))) +
  geom_point()

2 a: The conversion factor from dollars to British pounds sterling is given as 1 pound equals about 1.33 dollars.the slope will also become divided by 1.33. 2 b: When converting the units of the explanatory variable from dollars to British pounds sterling in a regression analysis, the correlation between variables does not change. The correlation measures the strength and direction of the linear relationship between two variables and is not affected by the choice of units or scale used to measure the variables.

3:

Code
data(water)
pairs(water)

4:

Code
data(Rateprof)
ratings <- Rateprof[, c("quality", "helpfulness", "clarity", "easiness", "raterInterest")]
pairs(ratings)

Quality vs. Other Ratings: Look for relationships between the quality rating and the other ratings (helpfulness, clarity, easiness, and raterInterest). A positive relationship would indicate that higher ratings in one variable tend to be associated with higher ratings in the other variable. Helpfulness vs. Clarity: Examine the relationship between the helpfulness and clarity ratings. If there is a positive relationship, it suggests that instructors who are rated as more helpful also tend to be rated as more clear in their teaching. Easiness vs. Other Ratings: Consider the relationship between the easiness rating and the other ratings. A positive relationship would indicate that courses perceived as easier tend to receive higher ratings in terms of quality, helpfulness, clarity, and rater interest. RaterInterest vs. Other Ratings: Look for any relationships between the raterInterest rating and the other ratings. A positive relationship would indicate that instructors who teach subjects that students are more interested in tend to receive higher ratings in terms of quality, helpfulness, clarity, and easiness.

5(a):

Code
data(student.survey)
ggplot(data = student.survey , aes(x=re, y=pi)) + geom_point() +
  labs(title="Religion and Political Ideology",
        x ="Frequency attending religious services", y = "Political ideology")

Code
ggplot(data = student.survey, aes(x=tv, y=hi)) + geom_point() +
  labs(title="Scatter Plot Hours watching TV and High School GPA",
        x ="Average hours watching TV", y = "High School GPA")

5(b):

Code
model1 <- lm(as.numeric(pi) ~ as.numeric(re), data = student.survey)
summary(model1)

Call:
lm(formula = as.numeric(pi) ~ as.numeric(re), data = student.survey)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.81243 -0.87160  0.09882  1.12840  3.09882 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.9308     0.4252   2.189   0.0327 *  
as.numeric(re)   0.9704     0.1792   5.416 1.22e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.345 on 58 degrees of freedom
Multiple R-squared:  0.3359,    Adjusted R-squared:  0.3244 
F-statistic: 29.34 on 1 and 58 DF,  p-value: 1.221e-06
Code
model2 <- lm(hi ~ tv, data = student.survey)
summary(model2)

Call:
lm(formula = hi ~ tv, data = student.survey)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2583 -0.2456  0.0417  0.3368  0.7051 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.441353   0.085345  40.323   <2e-16 ***
tv          -0.018305   0.008658  -2.114   0.0388 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4467 on 58 degrees of freedom
Multiple R-squared:  0.07156,   Adjusted R-squared:  0.05555 
F-statistic: 4.471 on 1 and 58 DF,  p-value: 0.03879

The analysis suggests a negative relationship between the average TV hours per week and the GPA score. For every additional hour of TV watched per week, the GPA score is expected to decrease by approximately 3 points. However, the relationship is weak or low, and eliminating outliers may not significantly change the results. Furthermore, the low R-squared value indicates that TV hours explain only a small portion of the GPA score.