Homework 2 - Prahitha Movva

hw2
p-value
confidence level
The second homework
Author

Prahitha Movva

Published

October 17, 2022

Code
library(readxl)
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(dplyr)
library(stats)
knitr::opts_chunk$set(echo=TRUE, warning=FALSE)

Question 1

Angiography

Code
sample.mean <- 18
sample.n <- 847
sample.sd <- 9
sample.se <- sample.sd/sqrt(sample.n)

alpha <- 0.10
degrees.freedom <- sample.n - 1
t.score <- qt(p=alpha/2, df=degrees.freedom,lower.tail=F)

margin.error <- t.score * sample.se
lower.bound <- sample.mean - margin.error
upper.bound <- sample.mean + margin.error
print(c(lower.bound,upper.bound))
[1] 17.49078 18.50922
Code
print(upper.bound - lower.bound)
[1] 1.018436

Bypass

Code
sample.mean <- 19
sample.n <- 539
sample.sd <- 10
sample.se <- sample.sd/sqrt(sample.n)

alpha <- 0.10
degrees.freedom <- sample.n - 1
t.score <- qt(p=alpha/2, df=degrees.freedom,lower.tail=F)

margin.error <- t.score * sample.se
lower.bound <- sample.mean - margin.error
upper.bound <- sample.mean + margin.error
print(c(lower.bound,upper.bound))
[1] 18.29029 19.70971
Code
print(upper.bound - lower.bound)
[1] 1.419421

The 90% confidence interval for angiography is [17.49, 18.51] wait days (1.02) and for bypass is [18.29, 19.71] wait days (1.42). The confidence interval for angiography is slightly narrower (by 0.4).

Question 2

Code
sample.trials <- 1031
sample.successes <- 567
p <- sample.successes/sample.trials
print(p)
[1] 0.5499515
Code
# Here both the mean and the standard deviation are unknown
prop.test <- prop.test(sample.successes, sample.trials, p=p, conf.level=0.95)
print(prop.test$conf.int)
[1] 0.5194543 0.5800778
attr(,"conf.level")
[1] 0.95

The point estimate, p, is 0.55 and the 95% confidence interval for p is [0.52, 0.58]. We can say that we are 95% confident that the true proportion of all adult Americans who believe that a college education is essential for success lies between 0.52 and 0.58. In other words, 95% of confidence intervals will contain the true proportion.

Question 3

Code
range <- 200-30
margin.error <- 5
population.sd <- range/4
alpha <- 0.05
z_score <- qnorm(p=alpha/2, lower.tail=F)
sample.n <- ((z_score*population.sd)/margin.error)^2
print(sample.n)
[1] 277.5454

The sample size should be 278.

Question 4

Code
population.mean <- 500
sample.mean <- 410
sample.s <- 90
sample.n <- 9

a

Ho: The true mean income of female employees is $500/week

Ha: The true mean income of female employees is not $500/week

Assumptions:

  1. The data is normally distributed

  2. Ho is true

  3. 95% CI

Code
t.numerator <- sample.mean - population.mean
t.denominator <- sample.s/sqrt(sample.n)
t.statistic <- t.numerator/t.denominator

p.value <- pt(q=abs(t.statistic), df=sample.n-1, lower.tail=F)*2
print(t.statistic)
[1] -3
Code
print(p.value)
[1] 0.01707168

The t statistic is -3 and the p-value at 5% significance level is 0.017. Since the p-value is less than 0.05, it is evidence against Ho, i.e., the mean income of female employees differ significantly from $500 per week.

b

Code
p.value_less <- pt(q=t.statistic, df=sample.n-1, lower.tail=T)
print(p.value_less)
[1] 0.008535841

Here too, the p-value at 5% significance level is less than 0.05. So we reject Ho and can say that the mean income of female employees is significantly less than $500 per week.

c

Code
p.value_greater <- pt(q=t.statistic, df=sample.n-1, lower.tail=F)
print(p.value_greater)
[1] 0.9914642

Here, the p-value at 5% significance level is higher than 0.05 and we fail to reject Ho. This means, we do not have evidence that the mean income of female employees is more than $500 per week.

Code
p.value_greater + p.value_less
[1] 1

Question 5

Code
sample.n <- 1000
jones.mean <- 519.5
jones.se <- 10
smith.mean <- 519.7
smith.se <- 10
population.mean <- 500

a

Code
jones.t <- ((jones.mean-population.mean)/jones.se)
jones.t
[1] 1.95
Code
jones.p <- pt(q=abs(jones.t), df=sample.n-1, lower.tail=F)*2
jones.p
[1] 0.05145555
Code
smith.t <- ((smith.mean-population.mean)/smith.se)
smith.t
[1] 1.97
Code
smith.p <- pt(q=abs(smith.t), df=sample.n-1, lower.tail=F)*2
smith.p
[1] 0.04911426

b

At 5% significance level, the result for Smith is statistically significant but that of Jones is not

c

This example shows using P > 0.05 or P <= 0.05 to see if we can the reject the null hypothesis or not is misleading, if the actual p-value is not reported. Both the p-values are only 0.1 significance level away from 0.05 but only one is significant, so the experiment might not be meaningful without the actual p-values.

Question 6

Code
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
t.test(gas_taxes, alternative = c("less"), mu = 45)

    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278 

At a 95% confidence level, we see that the p-value of 0.038. Since this value is less than 0.05, we reject the null hypothesis and say that the average tax per gallon of gas in the US in 2005 was significantly less than 45 cents.