DACSS603_HW2

HW2

Stastical Inference

Author

Rahul Gundeti

Published

October 17, 2022

Code

library(tidyverse)
library(ggplot2)
library(stats)

knitr::opts_chunk$set(echo = TRUE)

Question 1

Creating the table

Code

procedure <- c("Bypass", "Angiography")
sample_size <- c(539, 847)
mwt <- c(19, 18)
s_stddev <- c(10, 9)

surgery <- data.frame(procedure, sample_size, mwt, s_stddev)
surgery

Code

std_error <- s_stddev / sqrt(sample_size)
std_error

[1] 0.4307305 0.3092437

Code

confidence_level <- 0.90
tail_area <- (1-confidence_level)/2
tail_area

[1] 0.05

Code

t_score <- qt(p = 1-tail_area, df = sample_size-1)
t_score

[1] 1.647691 1.646657

Code

Confidence_Interval <- c(mwt - t_score * std_error,
        mwt + t_score * std_error)
Confidence_Interval

[1] 18.29029 17.49078 19.70971 18.50922

The above results are obtained by fitting the 90% confidence interval level for the sample mean wait time for both the bypass surgery and the angiograph.

Bypass Surgery mean wait time : 18.29029 and 19.70971 days

Angiograph mean wait time: 17.49078 and 18.50922 days

The comparision shows that the wait time for Angiograph is shorter than that of Bypass Surgery.

Question 2

Code

prop.test(567, 1031, conf.level = .95)


    1-sample proportions test with continuity correction

data:  567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515

95 percent confidence interval: 0.5189682 0.5805580

The sample estimate for the point p, from the sample who believes that college is necessary for success is: 0.5499515

Question 3

Code

ME <- 5
z <- 1.96
s_sd <- (200-30)/4

s_size <- ((z*s_sd)/ME)^2
s_size

[1] 277.5556

The necessary sample size is: 278.

Question 4

A

We assume that the sample is random and that the population has a normal distribution.

Null hypothesis: H0: μ = 500

Alternative hypothesis: Ha: μ ≠ 500

We will reject the null hypothesis at a p-value less than or equal to 0.05

Code

s_mean <- 410
μ <- 500
s_sd <- 90
s_size <- 9

Calculating test-statistic

Code

t_score <- (s_mean-μ)/(s_sd/sqrt(s_size))
t_score

[1] -3

Calculating p-value

Code

p <- 2*pt(t_score, s_size-1)
p

[1] 0.01707168

The test-statistic is -3 and p-value is 0.01707168. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is not equal to $500.

B

We assume that the sample is random and that the population has a normal distribution.

Null hypothesis: H0: μ = 500

Alternative hypothesis: Ha: μ < 500

We will reject the null hypothesis at a p-value less than 0.05

Code

p <- pt(t_score, s_size-1, lower.tail = TRUE)
p

[1] 0.008535841

The p-value is 0.008535841. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is less than $500.

C

We assume that the sample is random and that the population has a normal distribution.

Null hypothesis: H0: μ = 500

Alternative hypothesis: Ha: μ > 500

We will reject the null hypothesis at a p-value less than 0.05

Code

p <- pt(t_score, s_size-1, lower.tail = FALSE)
p

[1] 0.9914642

The p-value is 0.9914642. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is greater than $500.

Question 5

A

We assume that the sample is random and that the population has a normal distribution.

Null hypothesis: H0: μ = 500

Alternative hypothesis: Ha: μ ≠ 500

We will reject the null hypothesis at a p-value less than 0.05

Calculating t-statistic and p-value for Jones

Code

s_mean <- 519.5
μ <- 500
se <- 10
s_size <- 1000

jt <- (s_mean-μ)/se
jt

[1] 1.95

Code

p <- 2*pt(jt, s_size-1, lower.tail = FALSE)
p

[1] 0.05145555

Calculating t-statistic and p-value for Smith

Code

s_mean <- 519.7
μ <- 500
se <- 10
s_size <- 1000

jt <- (s_mean-μ)/se
jt

[1] 1.97

Code

p <- 2*pt(jt, s_size-1, lower.tail = FALSE)
p

[1] 0.04911426

The test-statistic is 1.95, p-value is 0.05145555 for Jones and the test-statistic is 1.97, p-value is 0.05145555 for Smith.

B

The p-value is 0.05145555 for Jones. As p-value is greater than the 0.05, we fail to reject the null hypothesis. The p-value is 0.04911426 for Jones. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the result is statistically significant for Smith, but not Jones.

C

If we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. In the Jones/Smith example, reporting the results only as P ≤ 0.05 versus P > 0.05 will lead to different conclusions about very similar results.

Question 6

Code

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)

t.test(gas_taxes, mu = 18.4, conf.level = .95)


    One Sample t-test

data:  gas_taxes
t = 10.238, df = 17, p-value = 1.095e-08
alternative hypothesis: true mean is not equal to 18.4
95 percent confidence interval:
 36.23386 45.49169
sample estimates:
mean of x 
 40.86278

The 95% confidence interval for the mean tax per gallon is 36.23386 through 45.49169. We cannot conclude with 95% confidence that the mean tax is less than 45 cents, since the 95% confidence interval contains values above 45 cents.