HW 2

hw2

Author

Karen Detter

Published

October 17, 2022

Code

knitr::opts_chunk$set(echo = TRUE)

Code

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Q.1

- Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures:

Bypass -

Code

#assign values
Bs_sd <- 10
Bs_size <- 539
Bs_mean <- 19
#calculate standard error
Bstandard_error <- Bs_sd / sqrt(Bs_size)
Bstandard_error

[1] 0.4307305

Code

#calculate area of two tails
confidence_level <- 0.90
Btail_area <- (1-confidence_level)/2
Btail_area

[1] 0.05

Code

#calculate t-score
Bt_score <- qt(p = 1-Btail_area, df = Bs_size-1)
Bt_score

[1] 1.647691

Code

#calculate confidence interval
BCI <- c(Bs_mean - Bt_score * Bstandard_error, Bs_mean + Bt_score * Bstandard_error)
print(BCI)

[1] 18.29029 19.70971

Angiography-

Code

#assign values
As_sd <- 9
As_size <- 847
As_mean <- 18
#calculate standard error
Astandard_error <- As_sd / sqrt(As_size)
Astandard_error

[1] 0.3092437

Code

#calculate area of two tails
confidence_level <- 0.90
Atail_area <- (1-confidence_level)/2
Atail_area

[1] 0.05

Code

#calculate t-score
At_score <- qt(p = 1-Atail_area, df = As_size-1)
At_score

[1] 1.646657

Code

#calculate confidence interval
ACI <- c(As_mean - At_score * Astandard_error, As_mean + At_score * Astandard_error)
print(ACI)

[1] 17.49078 18.50922

- Is the confidence interval narrower for angiography or bypass surgery?

Code

#calculate differences in upper and lower bounds of both confidence intervals
(Bs_mean + Bt_score * Bstandard_error) - (Bs_mean - Bt_score * Bstandard_error)

[1] 1.419421

Code

(As_mean + At_score * Astandard_error) - (As_mean - At_score * Astandard_error)

[1] 1.018436

Angiography has a narrower confidence interval.

Q.2

- Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success.

Code

#assign values
k <- 567
n <- 1031
#calculate sample proportion
p <- k/n
p

[1] 0.5499515

- Construct and interpret a 95% confidence interval for p

Code

#calculate margin of error
margin <- qnorm(0.975) * sqrt(p*(1-p)/n)
#calculate lower and upper bounds of confidence interval
low <- p - margin
high <- p + margin
print(low)

[1] 0.5195839

Code

print(high)

[1] 0.5803191

The 95% confidence interval for the population proportion is [.52, .58]. Since 95% of confidence intervals calculated from point estimates of population proportions would contain the true mean population proportion, we can be reasonably confident that the true mean proportion of adult Americans who believe a college education is essential for success lies somewhere between 52 and 58%.

Q.3

- Assuming the significance level to be 5%, what should be the size of the sample?

Code

#assign values
z_score <- qnorm(.975) #assuming normal distribution and 95% confidence level
margin_error <- 5 #half of confidence interval
#calculate population standard deviation (one quarter of the range)
pop_sd <- (200-30) / 4

Code

#calculate sampling size of population mean
samp_size <- z_score^2 * pop_sd^2 / margin_error^2
samp_size

[1] 277.5454

The sample size should be 278.

Q.4

A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.

assumptions: random sampling, normally distributed data, adequate sample size; hypotheses: $H_{0}$ : $\bar{y}$ = $\mu$ ; $H_{\alpha}$ : $\bar{y}$ $\neq$ $\mu$ ; test statistic: t-statistic

Code

#calculate t-statistic
t_stat <- (410 - 500) / (90 / (sqrt(9)))
#calculate two-tailed p-value
p_val <- 2 * (pt(q = t_stat, df=8))
p_val

[1] 0.01707168

Assuming $\alpha$ = .05, we can reject $H_{0}$ because there is evidence to support $H_{\alpha}$.

B. Report the P-value for $H_{\alpha}$ : $\mu$ < 500. Interpret.

Code

#calculate lower-tail p-value
p_low <- pt(t_stat, df = 8, lower.tail = TRUE)
p_low

[1] 0.008535841

This p-value is significantly lower than the .05 significance level, which means that we can reject $H_{0}$ because there is evidence to support $H_{\alpha}$ : $\mu$ < 500.

C. Report and interpret the P-value for $H_{\alpha}$ : $\mu$ > 500.

Code

#calculate lower-tail p-value
p_high <- pt(t_stat, df = 8, lower.tail = FALSE)
p_high

[1] 0.9914642

Code

#double-check p-values
check <- p_high + p_low
check

[1] 1

This p-value is significantly higher than the .05 significance level, so in this case we fail to reject $H_{0}$ in favor of $H_{\alpha}$ : $\mu$ > 500.

Q.5

A. Show that t = 1.95 and P-value = 0.051 for Jones Show that t = 1.97 and P-value = 0.049 for Smith

Code

#calculate t-statistics
Jones_t <- (519.5 - 500) / 10
Jones_t

[1] 1.95

Code

Smith_t <- (519.7 - 500) / 10
Smith_t

[1] 1.97

Code

#calculate p-values
Jones_p <- 2 * (pt(q = Jones_t, df=999, lower.tail = FALSE))
Jones_p

[1] 0.05145555

Code

Smith_p <- 2 * (pt(q = Smith_t, df=999, lower.tail = FALSE))
Smith_p

[1] 0.04911426

B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”

At this significance level, Smith’s study would be considered significant and allow for rejection of the null hypothesis. Jones’ study, however, would fail to reject the null.

C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05”, or as “reject H0” versus “Do not reject H0”, without reporting the actual P-value.

This example shows the importance of being specific and thorough in reporting the “significance” of study findings. Both Smith and Jones produced results very near the cutoff point for statistical significance, so it would be critical to know both the actual p-value AND the exact standard, $\leq$ or <, being used to interpret the results in order to assess the actual impact of the findings. Reporting only “reject” or “do not reject” the null hypothesis would also not provide the information needed to make a judgment of the meaning of the findings, as it would not provide any evidence in support of the claim.

Q.6

Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.

Code

#assign values
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
#run one sample t-test
t.test(gas_taxes, mu = 45, alternative = 'less')


    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278

At the 95% confidence level, the p-value of $H_{\alpha}$ : $\mu$ < 45 is .04, indicating that we can reject $H_{0}$. Additionally, 45 is above the upper bound of the confidence interval, which also supports the alternative hypothesis.