hw2
regression
Author

Donny Snyder

Published

October 17, 2022

Code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(dplyr)

Question 1

Code
byPassConfInt <- NA
byPassConfInt[1] <- 19 + .9*(10/(sqrt(539)))
byPassConfInt[2] <- 19 - .9*(10/(sqrt(539)))

angConfInt <- NA
angConfInt[1] <- 18 + .9*(9/(sqrt(847)))
angConfInt[2] <- 18 - .9*(9/(sqrt(847)))

print(byPassConfInt)
[1] 19.38766 18.61234
Code
print(angConfInt)
[1] 18.27832 17.72168

The confidence interval is narrower than for angiography than for bypass surgery.

Question 2

Code
pointEstData <- NA
pointEstData[1:567] <- 1
pointEstData[568:1031] <- 0
pointSD <- sd(pointEstData)
pointEst <- 567/1031
pointConfInt <- NA
pointConfInt[1] <- pointEst + .95*(pointSD/(sqrt(1031)))
pointConfInt[2] <- pointEst - .95*(pointSD/(sqrt(1031)))
print(pointConfInt)
[1] 0.5646779 0.5352251

The confidence interval here suggests that we can assume with 95% confidence that between 56.5% of adult Americans and 53.5% believe that college education is essential for success.

Question 3

Code
popSD <- (200 - 30)/4
criticalVal <- 1.96
sampSize <- ((popSD * criticalVal)/5)^2
print(sampSize)
[1] 277.5556

The size of the sample should be 278.

Question 4a

Null hypothesis: Womens income does not deviate from the mean income of senior-level workers.

Alternative hypothesis: Womens income does deviate from the mean income of senior-level workers.

Code
tStat <- (410 - 500)/(90/(sqrt(9)))
degreeFree <- 9-1
2*pt(-tStat, degreeFree, lower.tail = FALSE)
[1] 0.01707168
Code
pt(-tStat, degreeFree, lower.tail = FALSE)
[1] 0.008535841
Code
pt(tStat, degreeFree, lower.tail = FALSE)
[1] 0.9914642

The p value of this test statistic and degrees of freedom is 0.017.

#4b The p-value for the one-tailed test h0 < 500 is 0.0085. This is half because it is only measuring half of the distribution.

#4c The p-value for the one-tailed test h0 > 500 is 0.9915. This is because this is measuring in the opposite direction of the actual mean.

#Question 5

Code
tStatJones <- (519.5 - 500)/(10)
tStatSmith <- (519.7 - 500)/(10)
degreeFree <- 1000-1

2*pt(tStatJones,degreeFree, lower.tail = FALSE)
[1] 0.05145555
Code
2*pt(tStatSmith,degreeFree, lower.tail = FALSE)
[1] 0.04911426

#Question 5b As you can see from the printed values, the Smith study is statistically significant while the Jones study is not.

#Question 5c If you do not report the p-value, you cannot tell how close the p-value is to being significant, so it can get rid of the value of running the study to not report it.

#Question 6

Code
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

tStatGas <- (mean(gas_taxes) - 45)/(sqrt(sd(gas_taxes)/length(gas_taxes)))
degreeFree <- length(gas_taxes) - 1

2*pt(-tStatGas,degreeFree, lower.tail = FALSE)
[1] 2.341428e-05

Yes there is enough evidence. The p-value is far below p = 0.05, at 0.0000234.