library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
library(stats)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
#margin of error and confidence interval (bypass)
margin_error <- tscore_bypass*se_bypass
lower_CI <- x_bypass - margin_error
upper_CI <- lower_CI+margin_error
print(c(lower_CI,upper_CI))
[1] 18.29029 19.00000
#margin of error and confidence interval (angiography)
margin_error_angio <- tscore_angio*se_angio
angio_lowerCI <- x_angio - margin_error_angio
angio_upperCI <- angio_lowerCI+margin_error_angio
print(c(angio_lowerCI,angio_upperCI))
[1] 17.49078 18.00000
The bypass confidence interval for the true mean wait time is 18.290, 19.000, or 0.710.
The angiography confidence interval the true mean wait time is 17.491, 18.000, or 0.509.
The confidence interval is narrower for angiography.
[1] 0.03036761
#95% Confidence Interval
CI_lower <- sample_prop - margin_error2
CI_upper <- CI_lower + margin_error2
print(c(CI_lower,CI_upper))
[1] 0.5195839 0.5499515
The point estimate p, of the proportion of all adult Americans who believe that college is essential for success is 0.549, or ~55%. The margin of error is 0.030, which lines up, since the confidence interval is 0.519, 0.549.
#Since most formulas require a value for sample size (n), whichever one I use will have to be reorganized: the confidence interval formula. But because I am looking for n, it has to read z*(s/5)^2=n.
f <-function(n, z = 1.96, s = 42.5) {
res <- z*s/sqrt(n)
return(res)
}
vec <- vapply(1:300, FUN = f, FUN.VALUE = 5.0)
which(vec < 5) [1]
[1] 278
####(Hint: The P-values for the two possible one-sided tests must sum to 1.)
#In order to test whether or not the mean income for female employees differs from $500/week, we must first condect a one-sample, two-sided significance test.
#We can also assume the following:
#1. The sample is random and the population has a normal distribution
#2. The mean income for all senior-level workers = $500/week
#3. From the random sample of 9 female employees, the mean income = $410/week
#4. Standard deviation = 90
#5. Null Hypothesis: H0: μ = 500
#6. Alternative Hypothesis: Ha: μ ≠ 500
[1] -3
#P-value
n <- 9
df_n <- (n - 1)
t_test <- (410 - 500)/(90/sqrt(9))
p_value <- pt(t_test, df_n)*2
p_value
[1] 0.01707168
#P-value is 0.017. If we hold to the assumption that a=0.05, we can easily see that 0.017 < 0.05, which means the null hypothesis can be rejected. Therefore, there is enough statistical evidence to support the claim that the mean income for female employees differs from the overall mean of $500/week.
#P-value for Ha:my > 500
q <- -3
left_p_value <- pt(q,df_n,lower.tail=TRUE,log.p=FALSE)
left_p_value
[1] 0.008535841
[1] 0.9914642
#The p-value for H0:mu < 500 is 0.99, indicating strong evidence in favor of the null hypothesis. This contradicts the claim that mean mu > 500. To make sure my findings are correct, I must confrim that the sum of each p-value totals to 1. I could code this but it's not hard to tell that 0.01 + 0.99 = 1.
#Let's start with Jones and confirming that t=1.95 and the p-value = 0.051
#t-test
t_testj <- (519.5-500)/10.0
t_testj
[1] 1.95
#P-value
n5 <- 1000
df_5 <- (n5-1)
pvaluej <- pt(t_testj, df_5,lower.tail = FALSE,log.p = FALSE)*2
pvaluej
[1] 0.05145555
#Now lets move onto Smith with t = 1.97 and p-value = 0.049
#t-test
t_testSmith <- (519.7 - 500)/10.0
t_testSmith
[1] 1.97
#I'm going to start by calculating the t-score to find the upper and lower values in the gas_taxes interval
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
gas_tax_sample <- 18
df_gt <- gas_tax_sample - 1
mean_gt <- mean(gas_taxes)
tscore_gt <- qt(p=0.05,df=df_gt,lower.tail=FALSE)
gas_sd <- sd(gas_taxes)
me_gas_taxes <- qt(0.05,df = df_gt)*gas_sd/sqrt(18)
lower_int_gt<-(mean_gt+me_gas_taxes)
lower_int_gt
[1] 37.0461