Code
library(tidyverse)
library(ggplot2)
library(stats)
::opts_chunk$set(echo = TRUE) knitr
Rahul Gundeti
October 17, 2022
[1] 18.29029 17.49078 19.70971 18.50922
The above results are obtained by fitting the 90% confidence interval level for the sample mean wait time for both the bypass surgery and the angiograph.
Bypass Surgery mean wait time : 18.29029 and 19.70971 days
Angiograph mean wait time: 17.49078 and 18.50922 days
The comparision shows that the wait time for Angiograph is shorter than that of Bypass Surgery.
1-sample proportions test with continuity correction
data: 567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5189682 0.5805580
sample estimates:
p
0.5499515
95 percent confidence interval: 0.5189682 0.5805580
The sample estimate for the point p, from the sample who believes that college is necessary for success is: 0.5499515
The necessary sample size is: 278.
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ ≠ 500
We will reject the null hypothesis at a p-value less than or equal to 0.05
The test-statistic is -3 and p-value is 0.01707168. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is not equal to $500.
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ < 500
We will reject the null hypothesis at a p-value less than 0.05
The p-value is 0.008535841. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is less than $500.
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ > 500
We will reject the null hypothesis at a p-value less than 0.05
The p-value is 0.9914642. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is greater than $500.
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ ≠ 500
We will reject the null hypothesis at a p-value less than 0.05
[1] 1.97
[1] 0.04911426
The test-statistic is 1.95, p-value is 0.05145555 for Jones and the test-statistic is 1.97, p-value is 0.05145555 for Smith.
The p-value is 0.05145555 for Jones. As p-value is greater than the 0.05, we fail to reject the null hypothesis. The p-value is 0.04911426 for Jones. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the result is statistically significant for Smith, but not Jones.
If we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. In the Jones/Smith example, reporting the results only as P ≤ 0.05 versus P > 0.05 will lead to different conclusions about very similar results.
One Sample t-test
data: gas_taxes
t = 10.238, df = 17, p-value = 1.095e-08
alternative hypothesis: true mean is not equal to 18.4
95 percent confidence interval:
36.23386 45.49169
sample estimates:
mean of x
40.86278
The 95% confidence interval for the mean tax per gallon is 36.23386 through 45.49169. We cannot conclude with 95% confidence that the mean tax is less than 45 cents, since the 95% confidence interval contains values above 45 cents.
---
title: "DACSS603_HW2"
author: "Rahul Gundeti"
description: "Stastical Inference"
date: "10/17/2022"
format:
html:
df-print: paged
css: styles.css
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- HW2
- Stastical Inference
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(ggplot2)
library(stats)
knitr::opts_chunk$set(echo = TRUE)
```
## Question 1
## Creating the table
```{r}
procedure <- c("Bypass", "Angiography")
sample_size <- c(539, 847)
mwt <- c(19, 18)
s_stddev <- c(10, 9)
surgery <- data.frame(procedure, sample_size, mwt, s_stddev)
surgery
```
```{r}
std_error <- s_stddev / sqrt(sample_size)
std_error
```
```{r}
confidence_level <- 0.90
tail_area <- (1-confidence_level)/2
tail_area
```
```{r}
t_score <- qt(p = 1-tail_area, df = sample_size-1)
t_score
```
```{r}
Confidence_Interval <- c(mwt - t_score * std_error,
mwt + t_score * std_error)
Confidence_Interval
```
The above results are obtained by fitting the 90% confidence interval level for the sample mean wait time for both the bypass surgery and the angiograph.
Bypass Surgery mean wait time : 18.29029 and 19.70971 days
Angiograph mean wait time: 17.49078 and 18.50922 days
The comparision shows that the wait time for Angiograph is shorter than that of Bypass Surgery.
## Question 2
```{r}
prop.test(567, 1031, conf.level = .95)
```
95 percent confidence interval: 0.5189682 0.5805580
The sample estimate for the point p, from the sample who believes that college is necessary for success is: 0.5499515
## Question 3
```{r}
ME <- 5
z <- 1.96
s_sd <- (200-30)/4
s_size <- ((z*s_sd)/ME)^2
s_size
```
The necessary sample size is: 278.
## Question 4
## A
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ ≠ 500
We will reject the null hypothesis at a p-value less than or equal to 0.05
```{r}
s_mean <- 410
μ <- 500
s_sd <- 90
s_size <- 9
```
## Calculating test-statistic
```{r}
t_score <- (s_mean-μ)/(s_sd/sqrt(s_size))
t_score
```
## Calculating p-value
```{r}
p <- 2*pt(t_score, s_size-1)
p
```
The test-statistic is -3 and p-value is 0.01707168. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is not equal to $500.
## B
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ < 500
We will reject the null hypothesis at a p-value less than 0.05
```{r}
p <- pt(t_score, s_size-1, lower.tail = TRUE)
p
```
The p-value is 0.008535841. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is less than $500.
## C
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ > 500
We will reject the null hypothesis at a p-value less than 0.05
```{r}
p <- pt(t_score, s_size-1, lower.tail = FALSE)
p
```
The p-value is 0.9914642. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is greater than $500.
## Question 5
## A
We assume that the sample is random and that the population has a normal distribution.
Null hypothesis: H0: μ = 500
Alternative hypothesis: Ha: μ ≠ 500
We will reject the null hypothesis at a p-value less than 0.05
## Calculating t-statistic and p-value for Jones
```{r}
s_mean <- 519.5
μ <- 500
se <- 10
s_size <- 1000
jt <- (s_mean-μ)/se
jt
p <- 2*pt(jt, s_size-1, lower.tail = FALSE)
p
```
## Calculating t-statistic and p-value for Smith
```{r}
s_mean <- 519.7
μ <- 500
se <- 10
s_size <- 1000
jt <- (s_mean-μ)/se
jt
p <- 2*pt(jt, s_size-1, lower.tail = FALSE)
p
```
The test-statistic is 1.95, p-value is 0.05145555 for Jones and the test-statistic is 1.97, p-value is 0.05145555 for Smith.
## B
The p-value is 0.05145555 for Jones. As p-value is greater than the 0.05, we fail to reject the null hypothesis. The p-value is 0.04911426 for Jones. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the result is statistically significant for Smith, but not Jones.
## C
If we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. In the Jones/Smith example, reporting the results only as *P ≤ 0.05* versus *P > 0.05* will lead to different conclusions about very similar results.
## Question 6
```{r}
gas_taxes <- c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)
t.test(gas_taxes, mu = 18.4, conf.level = .95)
```
The 95% confidence interval for the mean tax per gallon is 36.23386 through 45.49169. We cannot conclude with 95% confidence that the mean tax is less than 45 cents, since the 95% confidence interval contains values above 45 cents.