Homework 2

hw2

Shoshana Buck

dataset

Author

Shoshana Buck

Published

October 17, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Question 1

Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?

Tail area and standard error for Bypass

Code

tail_area<- (1-.90)/2
tail_area

[1] 0.05

Code

standard_error<- 10/sqrt(539)
standard_error

[1] 0.4307305

t-value for Bypass

Code

t_score<- qt(p= 1-tail_area, df= 538)
t_score

[1] 1.647691

Confidence interval and margin of error for Bypass

Code

CI<- c(19 - t_score * standard_error, 19 + t_score * standard_error)
CI

[1] 18.29029 19.70971

Code

MOE<- t_score *standard_error
MOE *1.41

[1] 1.000692

Standard error of angiography

Code

standard_error2<- 9/sqrt(847)
standard_error2

[1] 0.3092437

t-score for angiography

Code

t_score2<- qt(p= 1-.05, df= 846)
t_score2

[1] 1.646657

Confidence interval and margin of error for angiography

Code

CI<- c(18 - t_score2 * standard_error2, 18 + t_score2 * standard_error2)
CI

[1] 17.49078 18.50922

Code

MOE2<- t_score2 *standard_error2
MOE2 *1.01

[1] 0.5143103

The Bypass points are [18.29029 & 19.70971] days and has a margin of error of +/- 0.7. Whereas the angiography is [17.49 & 18.50] days with a margin of error of +/- 0.5. Angigography is more narrower because it has a larger sample size and the range between the high and low end of the confidence interval is smaller.

Question 2

Point estimate P

Code

s_size<- 1031
b<- 567

point_estimate<- b/s_size
point_estimate

[1] 0.5499515

95% confidence interval for P

Code

prop.test(b,s_size)


    1-sample proportions test with continuity correction

data:  b out of s_size, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515

Based off the point estimate, 54% of the adult Americans that were surveyed by the National Center for Public Policy believe that college education is essential for success. 95% confidence interval of adult Americans who believe that college education is essential for success is [0.5189682 0.5805580] which contains the true population mean.

Question 3

Assuming the significance level to be 5%, what should be the size sample?

Standard Deviation

Code

sd<- (200-30)/4
sd

[1] 42.5

Solving for n

Code

#steps for the equation
#1. 5 = 1.96 * (42.5/sqrt(n))

#2. 5 = 8.3/sqrt(n)

#3. 5*sqrt(n)= 83.3

#4. sqrt(n) = 83.3/5

#5. n= (83.3/5)^2

#6. n= 278.89

The standard deviation from the data is 42.5. Since we have solved for the standard deviation we can plug it into the CI equation and solve for n.

Question 4

A

Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result. ## Assumptions

We are assuming there is normal distribution, the null hypothesis is: μ= 500 and the alternative hypothesis is 500> μ <500.

Standard error

Code

s_sizef<- 9
sd<-90
s_meanf<- 410
null_hypo_mean<- 500

standard_errorf<- sd/sqrt(s_sizef)
standard_errorf

[1] 30

t-score

Code

t_stat<- (s_meanf-null_hypo_mean)/standard_errorf
t_stat

[1] -3

I took the sample mean of 410 subtracted that from the mu = 500 and then divided it by the standard error = 30.

p-value

Code

p_value<- (pt(t_stat, df=8)) *2
p_value

[1] 0.01707168

The p-value than the 5% significance level so we can reject the null hypothesis in favor of the alternative hypothesis.

B +C

Report the P-value for Ha : μ < 500. Interpret. Report and interpret the P-value for H a: μ > 500.

Code

upper_p_value<- (pt(t_stat, df=8, lower.tail = FALSE))
upper_p_value

[1] 0.9914642

Code

lower_p_value<- (pt(t_stat, df=8, lower.tail = TRUE))
lower_p_value

[1] 0.008535841

The upper-tailed p-value is 0.99 and the lower-tailed p-value is 0.008. If you add the two tails together they will equal 1.

Question 5

Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.

Code

jones_sample_mean<- 519.5
smith_sample_mean<-519.7
null_hyp<- 500
jones_se<- 10
smith_se<- 10

A: Jones t-score and p-value

Code

jones_t_stat<- (jones_sample_mean-null_hyp)/jones_se
jones_t_stat

[1] 1.95

Code

jones_p_value<- pt(jones_t_stat, df=999, lower.tail = FALSE) *2
jones_p_value

[1] 0.05145555

A: Smith t-score and p-value

Code

smith_t_stat<-(smith_sample_mean-null_hyp)/smith_se
smith_t_stat

[1] 1.97

Code

smith_p_value<- pt(smith_t_stat, df=999, lower.tail = FALSE)*2
smith_p_value

[1] 0.04911426

B

Using α = 0.05, for each study indicate whether the result is “statistically significant.”

The results are “statistically significant when the p-value is smaller than the 0.05. Jones p-value is 0.051 which is greater than the 0.05 significance level which means it is not statistically significant and we cannot reject the null hypothesis. Smith’s p-value is 0.49 which is smaller than the significance level which means it is statistically significant and that we can reject the null hypothesis in favor of the alternative hypothesis.

C

“P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0 ,” is a misleading statement without providing the p-values because it makes it seem that there is a drastic difference between Jones and Smith that caused one hypothesis to be statistically significant and the other one not to be. However, when looking at the actual p-value it can be noted that there is a very small difference between the values.

Question 6

Code

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

Code

t.test(gas_taxes, mu=45, alternative = 'less')


    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278

Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents?

At the 95% confidence level the p-value is 0.03 which is less than the 5% significance level. This proves that we can reject the null hypothesis and that the average tax per gallon of gas in the US in 2005 was less than 45 cents.