Homework 2 - Prahitha Movva

hw2

p-value

confidence level

The second homework

Author

Prahitha Movva

Published

October 17, 2022

Code

library(readxl)
library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(dplyr)
library(stats)
knitr::opts_chunk$set(echo=TRUE, warning=FALSE)

Question 1

Angiography

Code

sample.mean <- 18
sample.n <- 847
sample.sd <- 9
sample.se <- sample.sd/sqrt(sample.n)

alpha <- 0.10
degrees.freedom <- sample.n - 1
t.score <- qt(p=alpha/2, df=degrees.freedom,lower.tail=F)

margin.error <- t.score * sample.se
lower.bound <- sample.mean - margin.error
upper.bound <- sample.mean + margin.error
print(c(lower.bound,upper.bound))

[1] 17.49078 18.50922

Code

print(upper.bound - lower.bound)

[1] 1.018436

Bypass

Code

sample.mean <- 19
sample.n <- 539
sample.sd <- 10
sample.se <- sample.sd/sqrt(sample.n)

alpha <- 0.10
degrees.freedom <- sample.n - 1
t.score <- qt(p=alpha/2, df=degrees.freedom,lower.tail=F)

margin.error <- t.score * sample.se
lower.bound <- sample.mean - margin.error
upper.bound <- sample.mean + margin.error
print(c(lower.bound,upper.bound))

[1] 18.29029 19.70971

Code

print(upper.bound - lower.bound)

[1] 1.419421

The 90% confidence interval for angiography is [17.49, 18.51] wait days (1.02) and for bypass is [18.29, 19.71] wait days (1.42). The confidence interval for angiography is slightly narrower (by 0.4).

Question 2

Code

sample.trials <- 1031
sample.successes <- 567
p <- sample.successes/sample.trials
print(p)

[1] 0.5499515

Code

# Here both the mean and the standard deviation are unknown
prop.test <- prop.test(sample.successes, sample.trials, p=p, conf.level=0.95)
print(prop.test$conf.int)

[1] 0.5194543 0.5800778
attr(,"conf.level")
[1] 0.95

The point estimate, p, is 0.55 and the 95% confidence interval for p is [0.52, 0.58]. We can say that we are 95% confident that the true proportion of all adult Americans who believe that a college education is essential for success lies between 0.52 and 0.58. In other words, 95% of confidence intervals will contain the true proportion.

Question 3

Code

range <- 200-30
margin.error <- 5
population.sd <- range/4
alpha <- 0.05
z_score <- qnorm(p=alpha/2, lower.tail=F)
sample.n <- ((z_score*population.sd)/margin.error)^2
print(sample.n)

[1] 277.5454

The sample size should be 278.

Question 4

Code

population.mean <- 500
sample.mean <- 410
sample.s <- 90
sample.n <- 9

a

Ho: The true mean income of female employees is $500/week

Ha: The true mean income of female employees is not $500/week

Assumptions:

The data is normally distributed
Ho is true
95% CI

Code

t.numerator <- sample.mean - population.mean
t.denominator <- sample.s/sqrt(sample.n)
t.statistic <- t.numerator/t.denominator

p.value <- pt(q=abs(t.statistic), df=sample.n-1, lower.tail=F)*2
print(t.statistic)

[1] -3

Code

print(p.value)

[1] 0.01707168

The t statistic is -3 and the p-value at 5% significance level is 0.017. Since the p-value is less than 0.05, it is evidence against Ho, i.e., the mean income of female employees differ significantly from $500 per week.

b

Code

p.value_less <- pt(q=t.statistic, df=sample.n-1, lower.tail=T)
print(p.value_less)

[1] 0.008535841

Here too, the p-value at 5% significance level is less than 0.05. So we reject Ho and can say that the mean income of female employees is significantly less than $500 per week.

c

Code

p.value_greater <- pt(q=t.statistic, df=sample.n-1, lower.tail=F)
print(p.value_greater)

[1] 0.9914642

Here, the p-value at 5% significance level is higher than 0.05 and we fail to reject Ho. This means, we do not have evidence that the mean income of female employees is more than $500 per week.

Code

p.value_greater + p.value_less

[1] 1

Question 5

Code

sample.n <- 1000
jones.mean <- 519.5
jones.se <- 10
smith.mean <- 519.7
smith.se <- 10
population.mean <- 500

a

Code

jones.t <- ((jones.mean-population.mean)/jones.se)
jones.t

[1] 1.95

Code

jones.p <- pt(q=abs(jones.t), df=sample.n-1, lower.tail=F)*2
jones.p

[1] 0.05145555

Code

smith.t <- ((smith.mean-population.mean)/smith.se)
smith.t

[1] 1.97

Code

smith.p <- pt(q=abs(smith.t), df=sample.n-1, lower.tail=F)*2
smith.p

[1] 0.04911426

b

At 5% significance level, the result for Smith is statistically significant but that of Jones is not

c

This example shows using P > 0.05 or P <= 0.05 to see if we can the reject the null hypothesis or not is misleading, if the actual p-value is not reported. Both the p-values are only 0.1 significance level away from 0.05 but only one is significant, so the experiment might not be meaningful without the actual p-values.

Question 6

Code

gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
t.test(gas_taxes, alternative = c("less"), mu = 45)


    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278

At a 95% confidence level, we see that the p-value of 0.038. Since this value is less than 0.05, we reject the null hypothesis and say that the average tax per gallon of gas in the US in 2005 was significantly less than 45 cents.