= 539
bypass_n = 847
angio_n
= 19
bypass_sample_mean = 18
angio_sample_mean
= 10
bypass_sample_sd = 9
angio_sample_sd
= bypass_sample_sd/sqrt(bypass_n)
bypass_se = angio_sample_sd/sqrt(angio_n)
angio_se
= qt(0.95, df = bypass_n - 1)*bypass_se
bypass_me = qt(0.95, df = angio_n - 1)*angio_se angio_me
Homework 2
Please check your answers against the solutions.
Question 1
The confidence intervals:
print(bypass_sample_mean + c(-bypass_me, bypass_me))
[1] 18.29029 19.70971
print(angio_sample_mean + c(-angio_me, angio_me))
[1] 17.49078 18.50922
The size of the confidence intervals, which is twice the margin of error:
2 * bypass_me
[1] 1.419421
2 * angio_me
[1] 1.018436
The confidence interval for angiography is narrower.
Question 2
one-step solution:
= 1031
n = 567
k prop.test(k, n)
1-sample proportions test with continuity correction
data: k out of n, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5189682 0.5805580
sample estimates:
p
0.5499515
Alternatively:
<- k/n # point estimate
p_hat = sqrt((p_hat*(1-p_hat))/n) # standard error
se = qnorm(0.975)*se # margin of error
e + c(-e, e) # confidence interval p_hat
[1] 0.5195839 0.5803191
Alternatively, we can use the exact binomial test. In large samples like the one we have, the results should essentially be the same as prop.test().
binom.test(k, n)
Exact binomial test
data: k and n
number of successes = 567, number of trials = 1031, p-value = 0.001478
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.5189927 0.5806243
sample estimates:
probability of success
0.5499515
Question 3
= 200-30
range = range/4 population_sd
Remember:
\[CI_{95} = \bar x \pm z \frac{s}{\sqrt n}\] (We can use \(z\) because we assume population standard deviation is known.)
We want the number \(n\) that ensures:
\[ z \frac{s}{\sqrt n} = 5 \] \[ zs = 5 \sqrt n\] \[ \frac{zs}{5} = \sqrt n\] \[ (\frac{zs}{5})^2 = n\]
In our case:
= qnorm(.975)
z = population_sd
s = ((z *s) / 5)^2
n print(n)
[1] 277.5454
Rounding up, we need a sample of 278.
Question 4
We can write a function to find the t-statistic, and then do all the tests in a, b, and c using that.
\[t = \frac{\bar x - \mu}{s / \sqrt n}\]
where \(\bar x\) is them sample mean, \(\mu\) is the hypothesizes population mean, \(s\) is the sample standard deviation, and \(n\) is the sample size.
Writing this in R:
<- function(x_bar, mu, sd, n){
get_t_stat return((x_bar - mu) / (sd / sqrt(n)))
}
Find the t-statistic:
<- get_t_stat(x_bar = 410, mu = 500, sd = 90, n = 9) t_stat
A
Two-tailed test
= 9
n = 2*pt(t_stat, df = n-1)
pval_two_tail pval_two_tail
[1] 0.01707168
We can reject the hypothesis that population mean is 500.
B
= pt(t_stat, df = n-1)
pval_lower_tail pval_lower_tail
[1] 0.008535841
We can reject the hypothesis that population mean is greater than 500.
C
= pt(t_stat, df = n-1, lower.tail=FALSE)
pval_upper_tail pval_upper_tail
[1] 0.9914642
We fail to reject the hypothesis that population mean is less than 500.
Alternatively for C, we could just subtract the answer in B from 1:
1 - pval_lower_tail
[1] 0.9914642
Question 5
= ((519.5 - 500)/ 10)
t_jones = ((519.7 - 500)/ 10)
t_smith cat("t value for Jones:", t_jones, '\n')
t value for Jones: 1.95
cat("t value for Smith:", t_smith, '\n')
t value for Smith: 1.97
cat('p value for Jones:', round(2*pt(t_jones, df = 999, lower.tail=FALSE), 4), '\n')
p value for Jones: 0.0515
cat('p value for Smith:', round(2*pt(t_smith, df = 999, lower.tail=FALSE), 4), '\n')
p value for Smith: 0.0491
At 0.05 level Smith’s result is statistically significant but Jones’s is not. The result show the arbitrariness of the 0.05 demarcation line and the importance of reporting actual p-values to better make sense of results.
Question 6:
<- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
gas_taxes
t.test(gas_taxes, mu = 45, alternative = 'less')
One Sample t-test
data: gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
-Inf 44.67946
sample estimates:
mean of x
40.86278
In the one sided test, we are able to reject the null in favor of the alternative that the gas taxes are less than 45 cents.
Note that a two-sided test at the same level would not have resulted in the rejection of the null.
However, a two-sided 90% confidence interval gives the same upper bound, since now there is a 5% rejection are on two sides:
t.test(gas_taxes, mu = 45, alternative = 'two.sided', conf.level = 0.9)
One Sample t-test
data: gas_taxes
t = -1.8857, df = 17, p-value = 0.07654
alternative hypothesis: true mean is not equal to 45
90 percent confidence interval:
37.04610 44.67946
sample estimates:
mean of x
40.86278