hw2
karen
Author

Karen Kimble

Published

October 17, 2022

Code
# Setup

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(stats)
knitr::opts_chunk$set(echo = TRUE)

Question 1

Angiography

Code
confidence_level <- 0.90

tail_area <- (1-confidence_level)/2

standard_error <- 9 / sqrt(847)

t_score <- qt(p = 1-tail_area, df = 846)

CI_1 <- c(18 - t_score * standard_error, 18 + t_score * standard_error)

print(CI_1)
[1] 17.49078 18.50922

Bypass Surgery

Code
confidence_level <- 0.90

tail_area <- (1-confidence_level)/2

standard_error <- 10 / sqrt(539)

t_score <- qt(p = 1-tail_area, df = 538)

CI_2 <- c(19 - t_score * standard_error, 19 + t_score * standard_error)

print(CI_2)
[1] 18.29029 19.70971

The confidence interval for angiography (range of about 1.1) is smaller than the confidence interval for bypass surgery (range of about 1.5) at a confidence level of 90%.

Question 2

Code
# Point estimate

p <- 567/1031

p
[1] 0.5499515

The point estimate p for the proportion of all adult Americans who believe that a college education is essential for success.

Code
# Confidence Interval

confidence_level <- 0.95

tail_area <- (1-confidence_level)/2

standard_deviation <- sqrt((p * (1-p))/1031)

standard_error <- standard_deviation / sqrt(1031)

t_score <- qt(p = 1-tail_area, df = 1030)

CI_3 <- c(p - t_score * standard_error, p + t_score * standard_error)

print(CI_3)
[1] 0.5490046 0.5508984

Through this test, I am 95% confident that the true proportion of all adult Americans who believe a college education is essential for success lies between 0.549 and 0.551.

Question 3

Code
sd <- (200-30)/4

x <- 5/(2.26 * sd)

sample_size = (1/x)^2

sample_size
[1] 369.0241

The size of the sample should be at least 370 people. Since they want a confidence interval within 5 dollars and they assume the standard deviation is a quarter of the range of 30 dollars to 200 dollars, I was able to use the confidence interval equation to find the missing variable of sample size. The confidence level is 95%, meaning that with a large sample size, the t-score would be around 2.26, allowing me to use the equation and isolate the sample size.

Question 4

Part A

Ho: The true mean income of female employees is $500/week

Ha: The true mean income of female employees is not $500/week

Code
confidence_level <- 0.95

tail_area <- (1-confidence_level)/2

standard_error <- 90 / sqrt(9)

t_score <- qt(p = 1-tail_area, df = 8)

CI_4A <- c(410 - t_score * standard_error, 410 + t_score * standard_error)

p_value = 2 * pt(q = t_score, df = 8, lower.tail = FALSE)

p_value
[1] 0.05
Code
print(CI_4A)
[1] 340.8199 479.1801

The p-value is exactly 0.05, meaning it is not smaller than the alpha value of 0.05 and thus is not statistically significant. We do not have enough evidence to reject the null hypothesis. The confidence interval shows that we are 95% confident the true mean income of female employees lies between 340.82 dollars/week and 479.18 dollars/week.

Part B

Code
p_value = pt(q = t_score, df = 8, lower.tail = TRUE)

p_value
[1] 0.975

The p-value for the alternate hypothesis that the true mean income of female employees is greater than 500 dollars/week is 0.975. This value is extremely large and greater than the 0.05 alpha level, meaning there is not statistically significant evidence and thus do not reject the null hypothesis.

Part C

Code
p_value = pt(q = t_score, df = 8, lower.tail = FALSE)

p_value
[1] 0.025

The p-value for the alternative hypothesis that the true mean income of female employees is less than 500 dollars/week is 0.025. This value is less than the alpha value of 0.05, meaning it is statistically significant and we can reject the null hypothesis. The true mean income of female employees is likely less than 500 dollars/week.

Question 5

Part A

Code
# Jones

t_score <- (519.5-500)/(10)

t_score
[1] 1.95
Code
p_value <- 2 * pt(q = t_score, df = 999, lower.tail = FALSE)

p_value
[1] 0.05145555
Code
# Smith

t_score <- (519.7-500)/(10)

t_score
[1] 1.97
Code
p_value <- 2 * pt(q = t_score, df = 999, lower.tail = FALSE)

p_value
[1] 0.04911426

Part B

For Jones’s study, the results were not statistically significant because the p-value of 0.51 is greater than the alpha value of 0.05. For Smith’s study, the results were statistically signficiant because the p-value of 0.49 is less than the alpha value of 0.05.

Part C

Reporting the result of a test as P being greater or less than the alpha value can be misleading if the p value is not reported. In both studies, the p-value was .01 away from 0.05, yet in only one study was the result statistically significant. A small p-value may still be meaningful to report because it still shows that there was a relatively small probability of getting the result that one did. Not reporting the p-value when reporting the result and whether or not a hypothesis is rejected leaves an important part of the study out. Someone simply reading that a hypothesis was not rejected without knowing the p-value may assume the p-value was very large even when it was small (such as in the case of Smith vs. Jones), thus leaving out a major aspect of the study.

Question 6

Code
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

t.test(gas_taxes, alternative = c("less"), mu = 45)

    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278 

The p-value of this test is 0.038, which is less than the alpha value of 0.05. This means that there is statistically significant evidence and we can reject the null hypothesis, that the true average gas tax per gallon in the United States is 45 cents. There is significant evidence to suggest that the true average gas tax per gallon in the United States is less than 45 cents.