hw2
t.test
Homework 2
Author

Guanhua Tan

Published

March 23, 2023

Code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Question 1

Code
# Bypass 
s_mean<-19
s_size <-539
standard_error <- 10/539
standard_error
[1] 0.01855288
Code
# t-value
confidence_level <- 0.9
tail_area <- (1-confidence_level)/2
t_score <- qt(p = 1-tail_area, df = s_size-1)
t_score
[1] 1.647691
Code
# plug everything back in
CI <- c(s_mean - t_score * standard_error,
        s_mean + t_score * standard_error)
print(CI)
[1] 18.96943 19.03057

18.97 <= CI_bypass <= 19.03

Code
# Angiography
s_mean_a<-18
s_size_a<-847
standard_error_a <- 9/847
standard_error_a
[1] 0.01062574
Code
# t-value
confidence_level <- 0.9
tail_area <- (1-confidence_level)/2
t_score<- qt(p = 1-tail_area, df = s_size-1)
t_score
[1] 1.647691
Code
# plug everything back in
CI_a <- c(s_mean_a- t_score * standard_error_a,
        s_mean_a + t_score * standard_error_a)
print(CI_a)
[1] 17.98249 18.01751

17.98 <= CI_angiograpy<= 18.02

The Confidence Interval is narrower for Angiography surgery because it has a smaller standard_error.

Question 2

Code
p2 <- 567/1031
p2
[1] 0.5499515
Code
SE2 <-sqrt(p2*(1-p2)/1031)
tail_area2 <-(1-0.95)/2
t_score2 <-qt(p-tail_area2, df=1030)
Error in qt(p - tail_area2, df = 1030): object 'p' not found
Code
CI2_A<-p2-t_score2*SE2
Error in eval(expr, envir, enclos): object 't_score2' not found
Code
CI2_B <-p2+t_score2*SE2
Error in eval(expr, envir, enclos): object 't_score2' not found
Code
CI2_A
Error in eval(expr, envir, enclos): object 'CI2_A' not found
Code
CI2_B
Error in eval(expr, envir, enclos): object 'CI2_B' not found

0.549 <= P <= 0.551

Question 3

Code
sd_question3 <- (200-30)/4
Margin3 <-5
n <- (1.96*sd_question3/Margin3)^2
n
[1] 277.5556

the size of students is 277

#Question 4

Null hypothesis: The mean income of female employees is equal to $500 per week. H0: μ = $500 Alternative hypothesis: The mean income of female employees is different from $500 per week. Ha: μ ≠ $500 t.test suggests the mean income of female employees is different from $500 per week. We reject the Null hypothesis.

Code
female_group_mean <-410
sd_4<-90
n_4<-9
t_stat4<-(female_group_mean-500)/(sd_4/sqrt(n_4))
P_value_4 <-(1-pt(t_stat4, df = n_4-1, lower.tail = F))*2

t_stat4
[1] -3
Code
P_value_4
[1] 0.01707168

t-statistic is -3. p-value is 0.017.

B. Report the P-value for Ha: μ < 500. Interpret. C. Report and interpret the P-value for Ha: μ > 500.

Code
P_value_lower4<-pt(t_stat4, df=n_4-1, lower.tail=TRUE)
P_value_high4<-pt(t_stat4, df=n_4-1, lower.tail = F)

P_value_lower4
[1] 0.008535841
Code
P_value_high4
[1] 0.9914642

For Ha: mu<500, we run the pt function and p-value is 0.008, which suggests that we reject the Null hypothesis and the mean income of female employees is much less than 500.

For Ha: mu >500, we run the pt function and p-value is 0.99, which suggests we fail to reject the Null hypothesis and we are unable to demonstrate the income mean of female employees is greater thant 500.

Question 5

Code
# Question 5
t_score_5_Jones <-(519.5-500)/10
p_value_5_Jones<- 2*(1-pt(t_score_5_Jones, df=999))
t_score_5_Jones
[1] 1.95
Code
p_value_5_Jones
[1] 0.05145555
Code
t_score_5_Smith <-(519.7-500)/10
p_value_5_Smith <- 2*(1-pt(t_score_5_Smith, df=999))
t_score_5_Smith
[1] 1.97
Code
p_value_5_Smith
[1] 0.04911426

B If α=0.5, Smith is statically significant because his p-value is smaller than α. C If we don’t get the actual p-value, we can only conclude that Smith is statically significant without that there is a very tiny difference between two groups. Also, we will ignore that Smith’s p-value is barely smaller than α, which suggests that it is not extremely significant.

Question 6

Code
df_6<-data.frame("Grade Level"=c("Heathy sanck", "Unhealth snack"), "6th grade"=c(31,69), "7th grade"=c(43,57), "8th grade"=c(51,49))

chisq.test(df_6[,-1], correct=F)

    Pearson's Chi-squared test

data:  df_6[, -1]
X-squared = 8.3383, df = 2, p-value = 0.01547

Null hypothesis: means of 3 grades to choose two types of snack are equal. We should use chisq test to test the correlation between grades and the counts of healthy and unhealthy snacks. Chisq suggests that we should reject the null hypothesis because p-value is 0.01547, which is smaller than 0.5. In other words, different grades show differen choices of snacks.

Question 7

Code
# Question 7
df_7<- data.frame("Area1"=c(6.2,9.3,6.8,6.1,6.7,7.5),
                  "Area2"=c(7.5,8.2,8.5,8.2,7.0,9.3),
                  "Area3"=c(5.8,6.5,5.6,7.1,3.0,3.5))
df_7_long <- df_7 %>%
  pivot_longer(cols=c(Area1, Area2, Area3),names_to="Area", values_to = "Fee")

my.anova_7<-aov(Fee ~ Area, df_7_long)
summary(my.anova_7)
            Df Sum Sq Mean Sq F value  Pr(>F)   
Area         2  25.35  12.674   7.993 0.00433 **
Residuals   15  23.78   1.586                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null hypothesis: mean of three areas are equal. We should use anova test. Anova test suggests that we should reject the null hypothesis because p-value is 0.0043, which is much smaller than 0.5. In other words, tutions are highly related to areas.