hw2
shelton
Author

Dane Shelton

Published

October 16, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(warning= FALSE, message=FALSE)

Homework 2

90% Confidence Intervals

Code
# Bypass
n_bp <- 539
mean_bp <- 19
sd_bp <- 10
t_90 <- qt(.05, (n_bp-1), lower.tail=F)

#CI
upper_bp <- mean_bp + ((sd_bp/sqrt(539))*t_90)
lower_bp <- mean_bp - ((sd_bp/sqrt(539))*t_90)

ci90_bp <- c(lower_bp,upper_bp)
print(c("90% CI For Mean Bypass Wait", ci90_bp))
[1] "90% CI For Mean Bypass Wait" "18.2902893200424"           
[3] "19.7097106799576"           
Code
# Angiography
n_ag <- 847
mean_ag <- 18
sd_ag <- 9
t_90 <- qt(.05, (n_ag-1), lower.tail=F)

#CI
upper_ag <- mean_ag + (sd_ag/sqrt(539)*t_90)
lower_ag <- mean_ag - (sd_ag/sqrt(539)*t_90)

ci90_ag <- c(lower_ag,upper_ag)

print(c("90% CI For Angiography Wait", ci90_ag))
[1] "90% CI For Angiography Wait" "17.3616612514732"           
[3] "18.6383387485268"           
Code
print(c("Width Bypass", upper_bp-lower_bp))
[1] "Width Bypass"     "1.41942135991513"
Code
print(c("Width Angiography", upper_ag-lower_ag))
[1] "Width Angiography" "1.27667749705367" 

The 90% Confidence interval is narrower for the mean Angiography wait time (days) than mean Bypass wait due to the larger sample and smaller standard deviation.

One Prop Confidence Interval

Code
# College 95% CI
prop.test(567,1031,conf.level = .95)

    1-sample proportions test with continuity correction

data:  567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515 

The 95% confidence interval for the true proportion of Americans who believe a college education is essential for success is (0.52,0.58). 95% confidence is not a comment on the proportion itself, rather our method. If we took several samples and created a confidence interval for proportion p, 95% of the intervals would contain the true population proportion.

Because our confidence interval does not include .5, we can conclude that at .05 significance, the majority (>0.5) of Americans believe that a college educcation is essential for success.

Margin of Error Calculation

Code
# Margin of Error Calculation

ci_95 <- qnorm(.025, lower.tail=F)

# 5 = (170*.25)/sqrt(x)*1.96

(x <- ((170*.25)/5)*ci_95)^2
[1] 277.5454

To estimate mean textbook cost per semester within $5 of true value at .05 significance level the financial aid office would need 278 students in their sample.

One Sample T-Test

a

H0: mean income of female employees = $500/wk H1: mean income of female employees != $500/wk

Code
# womens data
w_xbar <- 410
w_n <- 9
w_sd <- 90
w_se <- 90/sqrt(w_n)
test_stat <- (w_xbar-500)/w_se
crit_2sided <- abs(qt(.025,w_n-1))
crit_less <- qt(.05, w_n-1,lower.tail=T)
crit_greater <- qt(.95,w_n-1,lower.tail=T)
p_value <- pt(test_stat, df=w_n-1, lower.tail=T)
pval_greater <- pt(test_stat, df=w_n-1, lower.tail=F)

# Two Sided 2 Test
print('Two-Sided T-Test')
[1] "Two-Sided T-Test"
Code
print(c('test-statistic (use absolute value):', test_stat))
[1] "test-statistic (use absolute value):"
[2] "-3"                                  
Code
print(c('rejection-region:', crit_2sided))
[1] "rejection-region:" "2.30600413520417" 
Code
print(c('p-value', 2*p_value))
[1] "p-value"            "0.0170716812337826"

p-value = .017; Reject the null, at alpha=.05 we have sufficient evidence to conclude female employees’ wages differ from $500/week. If female weekly income was equal to 500, we would expect 1.7% of samples to produce a sample mean of 410$ or more extreme.

b

H0: mean income of female employees = $500/wk H1: mean income of female employees is less than $500/wk

Code
print('Left-Sided T-Test')
[1] "Left-Sided T-Test"
Code
print(c('test-statistic:', test_stat))
[1] "test-statistic:" "-3"             
Code
print(c('rejection-region:',crit_less))
[1] "rejection-region:" "-1.8595480375309" 
Code
print(c('p-value',p_value))
[1] "p-value"             "0.00853584061689132"

p-value = .009; Reject the null, at alpha=.05 we have sufficient evidence to conclude female employees’ wages are less than $500/week. If mean female weekly income was equal to 500 , we would expect less than one percent of samples to produce a mean equal to or more extreme (less) than 410.

c

H0: mean income of female employees = $500/wk H1: mean income of female employees is greater than $500/wk

Code
print('Right-Sided T-Test')
[1] "Right-Sided T-Test"
Code
print(c('test-statistic:', test_stat))
[1] "test-statistic:" "-3"             
Code
print(c('rejection-region:',crit_greater))
[1] "rejection-region:" "1.8595480375309"  
Code
print(c('p-value',pval_greater))
[1] "p-value"           "0.991464159383109"

p-value = .991; Fail to reject the null, at alpha=.05 we do nothave sufficient evidence to conclude female employees’ wages are greater than $500/week. If female weekly income was equal to 500, we would expect 99 percent of samples to produce a mean equal to or greater than 410.

a & b
Code
# jones data
j_xbar <- 519.5
j_n <- 1000
j_se <- 10
j_test_stat <- (j_xbar-500)/j_se
crit_2sided <- abs(qt(.025,j_n-1))
j_p_value <- pt(j_test_stat, df=j_n-1, lower.tail=F)


# Jones Two Sided 2 Test
print('Jones Two-Sided T-Test')
[1] "Jones Two-Sided T-Test"
Code
print(c('test-statistic (use absolute value):', j_test_stat))
[1] "test-statistic (use absolute value):"
[2] "1.95"                                
Code
print(c('rejection-region:', crit_2sided))
[1] "rejection-region:" "1.96234146113345" 
Code
print(c('p-value', 2*j_p_value))
[1] "p-value"            "0.0514555476459477"
Code
print(c('insignificant at alpha = 0.05'))
[1] "insignificant at alpha = 0.05"
Code
# smith data
s_xbar <- 519.7
s_n <- 1000
s_se <- 10
s_test_stat <- (s_xbar-500)/s_se
crit_2sided <- abs(qt(.025,j_n-1))
s_p_value <- pt(s_test_stat, df=s_n-1, lower.tail=F)


# Smith Two Sided 2 Test
print('Smith Two-Sided T-Test')
[1] "Smith Two-Sided T-Test"
Code
print(c('test-statistic (use absolute value):', s_test_stat))
[1] "test-statistic (use absolute value):"
[2] "1.97"                                
Code
print(c('rejection-region:', crit_2sided))
[1] "rejection-region:" "1.96234146113345" 
Code
print(c('p-value', 2*s_p_value))
[1] "p-value"            "0.0491142565416521"
Code
print(c('significant at alpha = 0.05'))
[1] "significant at alpha = 0.05"
c

By not reporting the p-value, we do not understand the strength of the test - how extreme are the findings? In in an example like this, we see nearly identical results produce opposite significance results; language like “statistically significant” can get especially dangerous here to someone who is unfamiliar with basic statistical theory.

Code
gas_taxes <- c(51.27, 47.43, 38.89, 
               41.95, 28.61, 41.29, 
               52.19, 49.48, 35.02, 
               48.13, 39.28, 54.41, 
               41.66, 30.28, 18.49, 
               38.72, 33.41, 45.02)
t.test(gas_taxes, mu=45, alternative = 'less')

    One Sample t-test

data:  gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278 

Yes; at the 95% confidence level, we have sufficient evidence to reject the null hypothesis mu=45. 45 is not included in our left sided confidence interval, favoring the alternative hypothesis that the average tax on gas in the United States in 2005 was less than 45 cents per gallon.