Homework 2

hw2
Author

Saaradhaa M

Published

October 10, 2022

Qn 1

# calculate standard errors.
SE_B <- 10/sqrt(539)
SE_A <- 9/sqrt(847)

# calculate t-values.
CL <- 0.90  
TA <- (1-CL)/2
tvalue_b <- qt(p = 1-TA, df = 539-1)
tvalue_a <- qt(p = 1-TA, df = 847-1)

# calculate CI for bypass.
CIB <- c(19 - tvalue_b * SE_B,
        19 + tvalue_b * SE_B)
CIB
[1] 18.29029 19.70971
# calculate CI range for bypass.
(19 + tvalue_b * SE_B) - (19 - tvalue_b * SE_B)
[1] 1.419421
# calculate CI for angiography.
CIA <- c(18 - tvalue_a * SE_A,
        18 + tvalue_a * SE_A)
CIA
[1] 17.49078 18.50922
# calculate CI range for angiography.
(18 + tvalue_a * SE_A) - (18 - tvalue_a * SE_A)
[1] 1.018436

The 90% CI is narrower for angiography.

Qn 2

set.seed(0)
prop <- prop.test(x=567, n=1031)
prop

    1-sample proportions test with continuity correction

data:  567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515 

The 95% CI is [0.5189682, 0.580558], which includes the point estimate 0.5499515 and excludes 0.5. Hence, we can reject the null hypothesis that the true probability is 0.5 at the 5% significance level, p = 0.0014898.

Qn 3

# calculate population SD.
PSD <- (200-30)/4

# calculate sample size.
n <- round(((1.96*PSD)/5)^2)

The minimum required sample size is 278.

Qn 4a

Assumptions: H0 is true, observations are independent of one another, y is continuous and sample is approximately normally distributed. H0: μ = 500 Ha: μ ≠ 500

# calculate t-statistic.
t <- (410-500)/(90/sqrt(9))

# calculate p-value.
p <- 2*pt(q=abs(t), df=8, lower.tail=FALSE)
p
[1] 0.01707168

We can reject the null hypothesis at the 5% significance level, t(8) = 3, p = 0.0170717. Female employees’ mean income significantly differs from $500 per week.

[I have a question - I am confused on whether I was right to use the absolute value here, and when we should use absolute values.]

Qn 4b

# calculate p-value.
p2 <- pt(q=t, df=8, lower.tail=TRUE)
p2
[1] 0.008535841

We can reject the null hypothesis at the 5% significance level, t(8)= -3, p = 0.0085358. Female employees’ mean income is significantly less than $500 per week.

Qn 4c

# calculate p-value.
p3 <- pt(q=t, df=8, lower.tail=FALSE)
p3
[1] 0.9914642

We fail to reject the null hypothesis at the 5% significance level, t(8)= -3, p = 0.9914642. Female employees’ mean income is not significantly more than $500 per week.

Qn 5a

# calculate SD for Jones and Smith.
SD <- 10*sqrt(1000)

# calculate t for Jones.
t_j <- ((519.5-500)/SD) * sqrt(1000)
t_j
[1] 1.95
# calculate p-value for Jones.
p_j <- 2*(pt(q=t_j, df=999, lower.tail=FALSE))
p_j
[1] 0.05145555
# calculate t for Smith.
t_s <- ((519.7-500)/SD) * sqrt(1000)
t_s
[1] 1.97
# calculate p-value for Smith.
p_s <- 2*(pt(q=t_s, df=999, lower.tail=FALSE))
p_s
[1] 0.04911426

Qn 5b

The result is statistically significant for Smith, but not Jones.

Qn 5c

It is useful to report the exact p-value in cases like this, when the p-value is very close to alpha. It helps the reader to understand (1) why it was/was not rejected, and (2) how much evidence there is against the null hypothesis.

Qn 6

#create variable.
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

# do t-test.
tax <- t.test(gas_taxes, alternative="less",mu=45)
tax

    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278 

The 95% CI is [-, 44.6794598], which includes the estimated mean 40.8627778 and excludes 45. Hence, we can reject the null hypothesis at the 5% significance level, t(17)= -1.8857058, p = 0.0382708. The average tax per gallon in the US in 2005 was significantly less than 45 cents.