NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
Code
library(viridis)
Warning: package 'viridis' was built under R version 4.2.2
We can be 90% confident that the population mean wait time for the bypass procedure is between 18.29029 and 19.70971 days. And for the angiography procedure, We can be 90% confident that the population mean wait time is between 17.49078 and 18.50922 days.
From the above results, confidence interval of angiography procedure is narrower.
Question 2
Code
prop.test(567, 1031, conf.level = .95)
1-sample proportions test with continuity correction
data: 567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5189682 0.5805580
sample estimates:
p
0.5499515
Question 3
Code
ME <-5z <-1.96s_sd <- (200-30)/4sample_size <- ((z*s_sd)/ME)^2sample_size
[1] 277.5556
The necessary sample size is 278.
Question 4
4A
We assume that the sample is random and that the population has a normal distribution. Null hypothesis: H0: μ = 500 Alternative hypothesis: Ha: μ ≠ 500 We will reject the null hypothesis at a p-value, p <= 0.05
The test-statistic is -3 and p-value is 0.01707168. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is not equal to $500.
4B
We assume that the sample is random and that the population has a normal distribution. Null hypothesis: H0: μ = 500 Alternative hypothesis: Ha: μ < 500 We will reject the null hypothesis at a p-value less than 0.05
The p-value is 0.008535841. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is less than $500.
4C
We assume that the sample is random and that the population has a normal distribution. Null hypothesis: H0: μ = 500 Alternative hypothesis: Ha: μ > 500 We will reject the null hypothesis at a p-value less than 0.05
The p-value is 0.9914642. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is greater than $500.
Question 5
5A
We assume that the sample is random and that the population has a normal distribution. Null hypothesis: H0: μ = 500 Alternative hypothesis: Ha: μ ≠ 500 We will reject the null hypothesis at a p-value less than 0.05
Code
# Calculating t-statistic and p-value for Joness_mean <-519.5μ <-500se <-10s_size <-1000james_t <- (s_mean-μ)/sejames_t
[1] 1.95
Code
p <-2*pt(james_t, s_size-1, lower.tail =FALSE)p
[1] 0.05145555
Code
# Calculating t-statistic and p-value for Smiths_mean <-519.7μ <-500se <-10s_size <-1000smith_t <- (s_mean-μ)/sesmith_t
[1] 1.97
Code
p <-2*pt(smith_t, s_size-1, lower.tail =FALSE)p
[1] 0.04911426
The test-statistic is 1.95, p-value is 0.05145555 for Jones and the test-statistic is 1.97, p-value is 0.05145555 for Smith.
5B
The p-value is 0.05145555 for Jones. As p-value is greater than the 0.05, we fail to reject the null hypothesis. The p-value is 0.04911426 for Jones. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the result is statistically significant for Smith, but not Jones.
5C
If we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. In the Jones/Smith example, reporting the results only as P ≤ 0.05 versus P > 0.05 will lead to different conclusions about very similar results.
Question 6
To test the claim that the proportion of students who choose a healthy snack differs by grade level, we will use the chi-squared test of independence. The null hypothesis is that the proportion of students who choose a healthy snack is the same across all grade levels. The alternative hypothesis is that the proportion of students who choose a healthy snack differs by grade level.
Code
# Data in a matrixtable_6 <-matrix(c(31, 43, 51, 69, 57, 49), nrow =2, byrow =TRUE)# Row and column namesrownames(table_6) <-c("Healthy snack", "Unhealthy snack")colnames(table_6) <-c("6th grade", "7th grade", "8th grade")# Chi-squared test of independencetest <-chisq.test(table_6)test
The p-value (0.01547) is less than the significance level (α = 0.05). Therefore, we reject the null hypothesis and conclude that the proportion of students who choose a healthy snack differs by grade level.)
Question 7
To test the claim that there is a difference in means for the three areas, we will use a one-way ANOVA test. The null hypothesis is that the means for the three areas are the same. The alternative hypothesis is that at least one area’s mean is different from the others.
The p-value (0.00397) is less than the significance level (α = 0.05). Therefore, we reject the null hypothesis and conclude that there is a difference in means for the three areas.
Source Code
---title: "Homework 2"author: "Adithya Parupudi"description: "my homework 2"date: "02/27/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw2 - Adithya Parupudi---```{r, echo=T}library(tidyverse)library(hrbrthemes)library(viridis)library(readxl)```# Question 1```{r}procedure <-c("Bypass", "Angiography")s_size <-c(539, 847)mean_wait_time <-c(19, 18)s_sd <-c(10, 9)surgery <-data.frame(procedure, s_size, mean_wait_time, s_sd)colnames(surgery) <-c("Surgical Procedure", "Sample Size","Mean Wait Time", "Standard Deviation")surgery``````{r}standard_error <- s_sd /sqrt(s_size)standard_error``````{r}confidence_level <-0.90tail_area <- (1-confidence_level)/2tail_area``````{r}t_score <-qt(p =1-tail_area, df = s_size-1)t_score``````{r}confidence_interval <-c(mean_wait_time - t_score * standard_error,mean_wait_time + t_score * standard_error)confidence_interval```We can be 90% confident that the population mean wait time for the bypass procedure is between 18.29029 and 19.70971 days. And for the angiography procedure, We can be 90% confident that the population mean wait time is between 17.49078 and 18.50922 days.From the above results, confidence interval of angiography procedure is narrower.# Question 2```{r}prop.test(567, 1031, conf.level = .95)```# Question 3```{r}ME <-5z <-1.96s_sd <- (200-30)/4sample_size <- ((z*s_sd)/ME)^2sample_size```The necessary sample size is 278.# Question 4## 4AWe assume that the sample is random and that the population has a normal distribution.Null hypothesis: H0: μ = 500Alternative hypothesis: Ha: μ ≠ 500We will reject the null hypothesis at a p-value, p <= 0.05```{r}# defining variabless_mean <-410μ <-500s_sd <-90s_size <-9``````{r}# Calculating test-statistict_score <- (s_mean-μ)/(s_sd/sqrt(s_size))t_score``````{r}# Calculating p-valuep <-2*pt(t_score, s_size-1)p```The test-statistic is -3 and p-value is 0.01707168. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is not equal to $500.## 4BWe assume that the sample is random and that the population has a normal distribution.Null hypothesis: H0: μ = 500Alternative hypothesis: Ha: μ < 500We will reject the null hypothesis at a p-value less than 0.05```{r}p_val <-pt(t_score, s_size-1, lower.tail =TRUE)p_val```The p-value is 0.008535841. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is less than $500.## 4CWe assume that the sample is random and that the population has a normal distribution.Null hypothesis: H0: μ = 500Alternative hypothesis: Ha: μ > 500We will reject the null hypothesis at a p-value less than 0.05```{r}p_val_c <-pt(t_score, s_size-1, lower.tail =FALSE)p_val_c```The p-value is 0.9914642. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the mean income of female employees is greater than $500.# Question 5## 5AWe assume that the sample is random and that the population has a normal distribution.Null hypothesis: H0: μ = 500Alternative hypothesis: Ha: μ ≠ 500We will reject the null hypothesis at a p-value less than 0.05```{r}# Calculating t-statistic and p-value for Joness_mean <-519.5μ <-500se <-10s_size <-1000james_t <- (s_mean-μ)/sejames_tp <-2*pt(james_t, s_size-1, lower.tail =FALSE)p``````{r}# Calculating t-statistic and p-value for Smiths_mean <-519.7μ <-500se <-10s_size <-1000smith_t <- (s_mean-μ)/sesmith_tp <-2*pt(smith_t, s_size-1, lower.tail =FALSE)p```The test-statistic is 1.95, p-value is 0.05145555 for Jones and the test-statistic is 1.97, p-value is 0.05145555 for Smith.## 5BThe p-value is 0.05145555 for Jones. As p-value is greater than the 0.05, we fail to reject the null hypothesis. The p-value is 0.04911426 for Jones. As p-value is less than the 0.05, we reject the null hypothesis. Therefore, the result is statistically significant for Smith, but not Jones.## 5CIf we fail to report the P-value and simply state whether the P-value is less than/equal to or greater than the defined significance level of the test, one cannot determine the strength of the conclusion. In the Jones/Smith example, reporting the results only as *P ≤ 0.05* versus *P > 0.05* will lead to different conclusions about very similar results.# Question 6To test the claim that the proportion of students who choose a healthy snack differs by grade level, we will use the chi-squared test of independence. The null hypothesis is that the proportion of students who choose a healthy snack is the same across all grade levels. The alternative hypothesis is that the proportion of students who choose a healthy snack differs by grade level.```{r}# Data in a matrixtable_6 <-matrix(c(31, 43, 51, 69, 57, 49), nrow =2, byrow =TRUE)# Row and column namesrownames(table_6) <-c("Healthy snack", "Unhealthy snack")colnames(table_6) <-c("6th grade", "7th grade", "8th grade")# Chi-squared test of independencetest <-chisq.test(table_6)test```The p-value (0.01547) is less than the significance level (α = 0.05). Therefore, we reject the null hypothesis and conclude that the proportion of students who choose a healthy snack differs by grade level.)# Question 7To test the claim that there is a difference in means for the three areas, we will use a one-way ANOVA test. The null hypothesis is that the means for the three areas are the same. The alternative hypothesis is that at least one area's mean is different from the others.```{r}# Data in vectorsarea1 <-c(6.2, 9.3, 6.8, 6.1, 6.7, 7.5)area2 <-c(7.5, 8.2, 8.5, 8.2, 7.0, 9.3)area3 <-c(5.8, 6.4, 5.6, 7.1, 3.0, 3.5)# One-way ANOVA testtest <-aov(c(area1, area2, area3) ~factor(c(rep("Area 1", length(area1)), rep("Area 2", length(area2)), rep("Area 3", length(area3)))))summary(test)```The p-value (0.00397) is less than the significance level (α = 0.05). Therefore, we reject the null hypothesis and conclude that there is a difference in means for the three areas.