Homework 3

hw3
linear regression
Homework 3
Author

Rosemary Pang

Published

April 12, 2023

Please check your answers against the solutions.

Question 1

Load the necessary packages.

library(alr4)
Error in library(alr4): there is no package called 'alr4'
library(smss)
Error in library(smss): there is no package called 'smss'
library(ggplot2)
library(stargazer)
Error in library(stargazer): there is no package called 'stargazer'

Load data:

data(UN11)
Warning in data(UN11): data set 'UN11' not found

(a)

The predictor is ppgdp, i.e. GDP per capita. The response is fertility, the birth rate per 1000 women.

(b)

ggplot(data = UN11, aes(x = ppgdp, y = fertility)) +
  geom_point()
Error in eval(expr, envir, enclos): object 'UN11' not found

A straight line is not appropriate, because the relationship has an L-shaped structure (or the left half of a U-shape).

(c)

ggplot(data = UN11, aes(x = log(ppgdp), y = log(fertility))) +
  geom_point()
Error in eval(expr, envir, enclos): object 'UN11' not found

Yes, now a simple linear regression model is more plausible. We can imagine a negative-sloped straight line going through those points.

Question 2

(a)

The conversion from USD to British pound will mean the numerical value of the response will be divided by 1.33. To offset that, the slope will also become divided by 1.33.

(b)

Correlation will not change because it is a standardized measure that is not influenced by the unit of measurement.

Both outcomes can easily be shown via simulation.

Question 3

data(water)
Warning in data(water): data set 'water' not found
pairs(water)
Error in eval(expr, envir, enclos): object 'water' not found
  1. Year appears to be largely unrelated to each of the other variables
  2. The three variables starting with “O” seem to be correlated with each other, meaning that all the plot including two of these variables exhibit a dependence between the variables that is stronger than the dependence between the “O” variables and other variables. The three variables starting with “A” also seem to be another correlated group
  3. BSAAM is more closely related to the “O” variables than the “A” variables

Question 4

data(Rateprof)
Warning in data(Rateprof): data set 'Rateprof' not found
pairs(Rateprof[,c('quality', 'clarity', 'helpfulness',
                  'easiness', 'raterInterest')])
Error in eval(expr, envir, enclos): object 'Rateprof' not found

The very strong pair-wise correlation among quality, clarity, and helpfulness is very striking. easiness is also correlated fairly highly with the other three. raterInterest is also moderately correlated, but raters almost always say they are at least moderately interested in the subject. Overall, the results might show that people don’t necessarily distinguish all these dimensions very well in their minds—or that professors that do one in one dimension tend to do well on the others too.

Question 5

(a)

One way of visually representing the relationship between religiosity and political ideology is as follows (and there are other ways). As we go towards bars to the right (more religiousity), we see lighter colors pop up (more conservatism)

data(student.survey)
Warning in data(student.survey): data set 'student.survey' not found
ggplot(data = student.survey, aes(x = re, fill = pi)) +
    geom_bar(position = "fill")
Error in eval(expr, envir, enclos): object 'student.survey' not found

The relationship between high school GPA and hours of watching TV can be shown with a good old scatter plot.

ggplot(data = student.survey, aes(x = tv, y = hi)) +
  geom_point() 
Error in eval(expr, envir, enclos): object 'student.survey' not found

(b)

Dealing with ordinal variables in linear regression is a difficult problem. We’ll just go ahead and assume that we can just convert them to numeric and use them. This would be done for political ideology and religiosity. High school GPA and hours of TV are already continuous.

m1 <- lm(as.numeric(pi) ~ as.numeric(re), 
         data = student.survey)
Error in eval(mf, parent.frame()): object 'student.survey' not found
m2 <- lm(hi ~ tv, data = student.survey)
Error in eval(mf, parent.frame()): object 'student.survey' not found
stargazer(m1, m2, type = 'text', 
          dep.var.labels = c('Pol. Ideology', 'HS GPA'),
          covariate.labels = c('Religiosity', 'Hours of TV')
          )
Error in stargazer(m1, m2, type = "text", dep.var.labels = c("Pol. Ideology", : could not find function "stargazer"

Religiosity is positively and statistically significantly (at the 0.01 significance level) associated with conservatism.

Hours of TV is negatively and statistically significantly (at the 0.05 significance level) associated with High School GPA. Watching an average of 1 more hour of TV per week is associated with a 0.018 decline in High School GPA.