Code
::opts_chunk$set(echo = TRUE) knitr
Karen Detter
October 17, 2022
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.5
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.4.1
✔ readr 2.1.3 ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
- Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures:
Bypass -
[1] 0.4307305
[1] 0.05
[1] 18.29029 19.70971
Angiography-
[1] 0.3092437
[1] 0.05
[1] 17.49078 18.50922
- Is the confidence interval narrower for angiography or bypass surgery?
[1] 1.419421
[1] 1.018436
Angiography has a narrower confidence interval.
- Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success.
- Construct and interpret a 95% confidence interval for p
[1] 0.5195839
[1] 0.5803191
The 95% confidence interval for the population proportion is [.52, .58]. Since 95% of confidence intervals calculated from point estimates of population proportions would contain the true mean population proportion, we can be reasonably confident that the true mean proportion of adult Americans who believe a college education is essential for success lies somewhere between 52 and 58%.
- Assuming the significance level to be 5%, what should be the size of the sample?
[1] 277.5454
The sample size should be 278.
A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
assumptions: random sampling, normally distributed data, adequate sample size; hypotheses: \(H_{0}\) : \(\bar{y}\) = \(\mu\) ; \(H_{\alpha}\) : \(\bar{y}\) \(\neq\) \(\mu\) ; test statistic: t-statistic
[1] 0.01707168
Assuming \(\alpha\) = .05, we can reject \(H_{0}\) because there is evidence to support \(H_{\alpha}\).
B. Report the P-value for \(H_{\alpha}\) : \(\mu\) < 500. Interpret.
[1] 0.008535841
This p-value is significantly lower than the .05 significance level, which means that we can reject \(H_{0}\) because there is evidence to support \(H_{\alpha}\) : \(\mu\) < 500.
C. Report and interpret the P-value for \(H_{\alpha}\) : \(\mu\) > 500.
[1] 0.9914642
This p-value is significantly higher than the .05 significance level, so in this case we fail to reject \(H_{0}\) in favor of \(H_{\alpha}\) : \(\mu\) > 500.
A. Show that t = 1.95 and P-value = 0.051 for Jones Show that t = 1.97 and P-value = 0.049 for Smith
[1] 1.95
[1] 1.97
[1] 0.05145555
[1] 0.04911426
B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”
At this significance level, Smith’s study would be considered significant and allow for rejection of the null hypothesis. Jones’ study, however, would fail to reject the null.
C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05”, or as “reject H0” versus “Do not reject H0”, without reporting the actual P-value.
This example shows the importance of being specific and thorough in reporting the “significance” of study findings. Both Smith and Jones produced results very near the cutoff point for statistical significance, so it would be critical to know both the actual p-value AND the exact standard, \(\leq\) or <, being used to interpret the results in order to assess the actual impact of the findings. Reporting only “reject” or “do not reject” the null hypothesis would also not provide the information needed to make a judgment of the meaning of the findings, as it would not provide any evidence in support of the claim.
Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.
One Sample t-test
data: gas_taxes
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
-Inf 44.67946
sample estimates:
mean of x
40.86278
At the 95% confidence level, the p-value of \(H_{\alpha}\) : \(\mu\) < 45 is .04, indicating that we can reject \(H_{0}\). Additionally, 45 is above the upper bound of the confidence interval, which also supports the alternative hypothesis.
---
title: "HW 2"
author: "Karen Detter"
desription: "Second Homework On Hypothesis Testing"
date: "10/17/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw2
---
```{r}
#| label: setup
#| warning: false
#|
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library(tidyverse)
```
## Q.1
**- Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures:**
*Bypass -*
```{r}
#assign values
Bs_sd <- 10
Bs_size <- 539
Bs_mean <- 19
#calculate standard error
Bstandard_error <- Bs_sd / sqrt(Bs_size)
Bstandard_error
```
```{r}
#calculate area of two tails
confidence_level <- 0.90
Btail_area <- (1-confidence_level)/2
Btail_area
```
```{r}
#calculate t-score
Bt_score <- qt(p = 1-Btail_area, df = Bs_size-1)
Bt_score
```
```{r}
#calculate confidence interval
BCI <- c(Bs_mean - Bt_score * Bstandard_error, Bs_mean + Bt_score * Bstandard_error)
print(BCI)
```
*Angiography-*
```{r}
#assign values
As_sd <- 9
As_size <- 847
As_mean <- 18
#calculate standard error
Astandard_error <- As_sd / sqrt(As_size)
Astandard_error
```
```{r}
#calculate area of two tails
confidence_level <- 0.90
Atail_area <- (1-confidence_level)/2
Atail_area
```
```{r}
#calculate t-score
At_score <- qt(p = 1-Atail_area, df = As_size-1)
At_score
```
```{r}
#calculate confidence interval
ACI <- c(As_mean - At_score * Astandard_error, As_mean + At_score * Astandard_error)
print(ACI)
```
**- Is the confidence interval narrower for angiography or bypass surgery?**
```{r}
#calculate differences in upper and lower bounds of both confidence intervals
(Bs_mean + Bt_score * Bstandard_error) - (Bs_mean - Bt_score * Bstandard_error)
(As_mean + At_score * Astandard_error) - (As_mean - At_score * Astandard_error)
```
Angiography has a narrower confidence interval.
## Q.2
**- Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success.**
```{r}
#assign values
k <- 567
n <- 1031
#calculate sample proportion
p <- k/n
p
```
**- Construct and interpret a 95% confidence interval for p**
```{r}
#calculate margin of error
margin <- qnorm(0.975) * sqrt(p*(1-p)/n)
#calculate lower and upper bounds of confidence interval
low <- p - margin
high <- p + margin
print(low)
print(high)
```
The 95% confidence interval for the population proportion is [.52, .58]. Since 95% of confidence intervals calculated from point estimates of population proportions would contain the true mean population proportion, we can be reasonably confident that the true mean proportion of adult Americans who believe a college education is essential for success lies somewhere between 52 and 58%.
## Q.3
**- Assuming the significance level to be 5%, what should be the size of the sample?**
```{r}
#assign values
z_score <- qnorm(.975) #assuming normal distribution and 95% confidence level
margin_error <- 5 #half of confidence interval
#calculate population standard deviation (one quarter of the range)
pop_sd <- (200-30) / 4
```
```{r}
#calculate sampling size of population mean
samp_size <- z_score^2 * pop_sd^2 / margin_error^2
samp_size
```
The sample size should be 278.
## Q.4
**A. Test whether the mean income of female employees differs from $500 per week.
Include assumptions, hypotheses, test statistic, and P-value.
Interpret the result.**
*assumptions:* random sampling, normally distributed data, adequate sample size;
*hypotheses:* *$H_{0}$* : $\bar{y}$ = $\mu$ ; *$H_{\alpha}$* : $\bar{y}$ $\neq$ $\mu$ ;
*test statistic:* t-statistic
```{r}
#calculate t-statistic
t_stat <- (410 - 500) / (90 / (sqrt(9)))
#calculate two-tailed p-value
p_val <- 2 * (pt(q = t_stat, df=8))
p_val
```
Assuming $\alpha$ = .05, we can reject $H_{0}$ because there is evidence to support $H_{\alpha}$.
**B. Report the P-value for $H_{\alpha}$ : $\mu$ < 500. Interpret.**
```{r}
#calculate lower-tail p-value
p_low <- pt(t_stat, df = 8, lower.tail = TRUE)
p_low
```
This p-value is significantly lower than the .05 significance level, which means that we can reject $H_{0}$ because there is evidence to support $H_{\alpha}$ : $\mu$ < 500.
**C. Report and interpret the P-value for $H_{\alpha}$ : $\mu$ > 500.**
```{r}
#calculate lower-tail p-value
p_high <- pt(t_stat, df = 8, lower.tail = FALSE)
p_high
```
```{r}
#double-check p-values
check <- p_high + p_low
check
```
This p-value is significantly higher than the .05 significance level, so in this case we fail to reject $H_{0}$ in favor of $H_{\alpha}$ : $\mu$ > 500.
## Q.5
**A. Show that t = 1.95 and P-value = 0.051 for Jones
Show that t = 1.97 and P-value = 0.049 for Smith**
```{r}
#calculate t-statistics
Jones_t <- (519.5 - 500) / 10
Jones_t
Smith_t <- (519.7 - 500) / 10
Smith_t
```
```{r}
#calculate p-values
Jones_p <- 2 * (pt(q = Jones_t, df=999, lower.tail = FALSE))
Jones_p
Smith_p <- 2 * (pt(q = Smith_t, df=999, lower.tail = FALSE))
Smith_p
```
**B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”**
At this significance level, Smith's study would be considered significant and allow for rejection of the null hypothesis. Jones' study, however, would fail to reject the null.
**C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05”, or as “reject H0” versus “Do not reject H0”, without reporting the actual P-value.**
This example shows the importance of being specific and thorough in reporting the "significance" of study findings. Both Smith and Jones produced results very near the cutoff point for statistical significance, so it would be critical to know both the actual p-value AND the exact standard, $\leq$ or <, being used to interpret the results in order to assess the actual impact of the findings. Reporting only "reject" or "do not reject" the null hypothesis would also not provide the information needed to make a judgment of the meaning of the findings, as it would not provide any evidence in support of the claim.
## Q.6
**Is there enough evidence to conclude at a 95% confidence level that the average tax per gallon of gas in the US in 2005 was less than 45 cents? Explain.**
```{r}
#assign values
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
#run one sample t-test
t.test(gas_taxes, mu = 45, alternative = 'less')
```
At the 95% confidence level, the p-value of $H_{\alpha}$ : $\mu$ < 45 is .04, indicating that we can reject $H_{0}$. Additionally, 45 is above the upper bound of the confidence interval, which also supports the alternative hypothesis.