DACSS 603 HW 3

Author

Karen Kimble

Published

October 31, 2022

Code

# Setup
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(dplyr)
library(alr4)

Loading required package: car
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

Loading required package: effects
lattice theme set by effectsTheme()
See ?effectsTheme for details.

Code

library(smss)

Warning: package 'smss' was built under R version 4.2.2

Question 1

Part 1

The predictor is the ppgdp, the gross national product per person in U.S. dollars, and the response is fertility, the birth rate per 1000 females.

Part 2

Code

data(UN11)
options(scipen = 999)
plot(fertility ~ ppgdp, data = UN11, main="Fertility by GDP per Capita", xlab = "GDP per Capita", ylab = "Fertility Rate per 1000 Females")

A straight-line function does not seem applicable to this graph because the trend of the data appears closer to a quadratic or exponential function. There are very many high values of fertility rate on the very low end of the x axis, but then this sharply changes between 10,000 and 20,000.

Part 3

Code

plot(log(fertility) ~ log(ppgdp), data = UN11, main="Log of Fertility by Log of GDP per Capita", xlab = "Log of GDP per Capita", ylab = "Log of Fertility Rate per 1000 Females")

Using the logs of each variable results in a different graph where a straight line would fit better than in the previous graph. A simple linear regression model seems plausible here.

Question 2

Part A

The slope of the prediction equation might decrease because the British pound is worth more U.S. dollars, so the slope would be increasing by the same number of units (for both 1.33 U.S. dollars and 1 British pound) but over a larger span of the x-axis.

Part B

The correlation would not change because the relationship between annual income and the dependent/response variable is the same regardless of units of measurement.

Question 3

Code

pairs(water, upper.panel = NULL)

It looks like there has been a lot more runoff at the OPRC, OPSLAKE, BSAAM, and APSLAkE in recent years. Runoff in the earliest years seemed to me more random and spread out.

Question 4

Code

pairs(Rateprof[,8:12], upper.panel = NULL)

It looks like as the ratings of professors’ quality of teaching increased, helpfulness and clarity ratings also increased. However, easiness and interest ratings don’t seem to be as correalted with these variables.

Question 5

Political Ideology vs Religiosity

Part A

Code

data(student.survey)
student.survey$pi <- unclass(student.survey$pi)
student.survey$re <- unclass(student.survey$re)

# In the political ideology variable, very conservative = 7 and very liberal = 1

# In the religiosity variable, attending religious services every week = 4, and never = 1

plot(pi ~ re, data = student.survey, main = "Political Ideology by Religiosity", xlab = "Religiousity", ylab = "Political Ideology")

Part B

Code

fit1 <- lm(pi ~ re, data = student.survey)
summary(fit1)


Call:
lm(formula = pi ~ re, data = student.survey)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.81243 -0.87160  0.09882  1.12840  3.09882 

Coefficients:
            Estimate Std. Error t value   Pr(>|t|)    
(Intercept)   0.9308     0.4252   2.189     0.0327 *  
re            0.9704     0.1792   5.416 0.00000122 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.345 on 58 degrees of freedom
Multiple R-squared:  0.3359,    Adjusted R-squared:  0.3244 
F-statistic: 29.34 on 1 and 58 DF,  p-value: 0.000001221

For the variables political ideology and religiosity, there seems to be a correlation between them. The p-value for religiosity is much smaller than the alpha value of 0.05, indicating that there is statistically significant evidence showing religiosity affects students’ political ideology. The two are positively related, meaning that, within the regression model, for an increase in religiosity (attending more religious services) by 1, there is a 0.97 increase in political ideology (more conservative or less liberal).

High School GPA versus TV Watching

Part A

Code

plot(hi ~ tv, data = student.survey, main = "High School GPA by Hours of TV Watched", xlab = "Avg Hours/Week Watching TV", ylab = "High School GPA")

Part B

Code

fit2 <- lm(hi ~ tv, data = student.survey)
summary(fit2)


Call:
lm(formula = hi ~ tv, data = student.survey)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2583 -0.2456  0.0417  0.3368  0.7051 

Coefficients:
             Estimate Std. Error t value            Pr(>|t|)    
(Intercept)  3.441353   0.085345  40.323 <0.0000000000000002 ***
tv          -0.018305   0.008658  -2.114              0.0388 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4467 on 58 degrees of freedom
Multiple R-squared:  0.07156,   Adjusted R-squared:  0.05555 
F-statistic: 4.471 on 1 and 58 DF,  p-value: 0.03879

There also seems to be a correlation between the variables high school GPA and hours of TV watched per week. The p-value for hours of TV is less than the alpha value of 0.05, indicating that there is statistically significant evidence showing TV watching affects students’ high school GPA. The variables are negatively related: for every hour of TV watched per week (on average), there is a 0.018 decrease in high school GPA based on the regression model.