Homework 3

hw3

regression

Author

Donny Snyder

Published

October 17, 2022

Code

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(alr4)

Loading required package: car
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.

Code

library(smss)

Warning: package 'smss' was built under R version 4.2.2

Code

library(dplyr)
library(ggplot2)
library(GGally)

Error in library(GGally): there is no package called 'GGally'

Question 1

Code

data <- UN11

ggplot(data, aes(x = ppgdp, y = fertility)) + geom_point()

Code

ggplot(data, aes(x = log(ppgdp), y = log(fertility))) + geom_point()

#Question 1.1 The predictor is ppgdp and the response is fertility.

#Question 1.2 A straight-line mean function does not seem to be plausible for this graph.

#Question 1.3 A simple linear regression model does seem plausible for a summary of the log log graph.

Code

data$ppgdp2 <- data$ppgdp*0.75

model1 <- lm(fertility ~ ppgdp, data)
summary(model1)


Call:
lm(formula = fertility ~ ppgdp, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9006 -0.8801 -0.3547  0.6749  3.7585 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.178e+00  1.048e-01  30.331  < 2e-16 ***
ppgdp       -3.201e-05  4.655e-06  -6.877  7.9e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.206 on 197 degrees of freedom
Multiple R-squared:  0.1936,    Adjusted R-squared:  0.1895 
F-statistic: 47.29 on 1 and 197 DF,  p-value: 7.903e-11

Code

model2 <- lm(fertility ~ ppgdp2, data)
summary(model2)


Call:
lm(formula = fertility ~ ppgdp2, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.9006 -0.8801 -0.3547  0.6749  3.7585 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.178e+00  1.048e-01  30.331  < 2e-16 ***
ppgdp2      -4.268e-05  6.206e-06  -6.877  7.9e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.206 on 197 degrees of freedom
Multiple R-squared:  0.1936,    Adjusted R-squared:  0.1895 
F-statistic: 47.29 on 1 and 197 DF,  p-value: 7.903e-11

Code

cor(data$fertility, data$ppgdp)

[1] -0.4399891

Code

cor(data$fertility, data$ppgdp2)

[1] -0.4399891

Question 2(a)

The slope of the prediction equation will increase, as the units of the explanatory variable are decreased.

Question 2(b)

The correlation will stay the same.

Question 3

Code

watData <- water
pairs(watData)

Code

ggpairs(watData)

Error in ggpairs(watData): could not find function "ggpairs"

OPBPC, OPRC, and OPSLAKE all seem to be highly correlated with BSAAM.

Question 4

Code

profData <- Rateprof
profData <- data.frame(profData$quality, profData$helpfulness, profData$clarity, profData$easiness, profData$raterInterest)
pairs(profData)

It seems as if quality, helpfulness and clarity are all highly interrelated. easiness and raterInterest are not as highly correlated.

#Question 5

Code

stud <- as.data.frame(student.survey)

Error in as.data.frame(student.survey): object 'student.survey' not found

Code

stud$piNum <- NA

Error in stud$piNum <- NA: object 'stud' not found

Code

stud$reNum <- NA

Error in stud$reNum <- NA: object 'stud' not found

Code

x = 1
while(x <= 60){
  if(stud$pi[x] == "very liberal"){
    stud$piNum[x] = -3
  }
  if(stud$pi[x] == "liberal"){
    stud$piNum[x] = -2
  }
  if(stud$pi[x] == "slightly liberal"){
    stud$piNum[x] = -1
  }
  if(stud$pi[x] == "moderate"){
    stud$piNum[x] = 0
  }
  if(stud$pi[x] == "very conservative"){
    stud$piNum[x] = 3
  }
  if(stud$pi[x] == "conservative"){
    stud$piNum[x] = 2
  }
  if(stud$pi[x] == "slightly liberal"){
    stud$piNum[x] = 1
  }
  
  
  if(stud$re[x] == "never"){
    stud$reNum[x] = 0
  }
  if(stud$re[x] == "occasionally"){
    stud$reNum[x] = 1
  }
  if(stud$re[x] == "most weeks"){
    stud$reNum[x] = 2
  }
  if(stud$re[x] == "every week"){
    stud$reNum[x] = 3
  }
  x = x + 1
}

Error in eval(expr, envir, enclos): object 'stud' not found

Code

model3 <- lm(piNum~reNum, stud)

Error in is.data.frame(data): object 'stud' not found

Code

summary(model3)

Error in summary(model3): object 'model3' not found

Code

model4 <- lm(hi~tv, stud)

Error in is.data.frame(data): object 'stud' not found

Code

summary(model4)

Error in summary(model4): object 'model4' not found

Code

ggplot(stud, aes(x = reNum, y = piNum)) + geom_jitter()

Error in ggplot(stud, aes(x = reNum, y = piNum)): object 'stud' not found

Code

ggplot(stud, aes(x = tv, y = hi)) + geom_jitter()

Error in ggplot(stud, aes(x = tv, y = hi)): object 'stud' not found

It seems like the results are that political ideology tends to be more right-leaning as religiosity increases. As hours of tv watching tends to go down, high school GPA tends to go up. These relationships are both statistically significant.