The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
library(readxl)library(ggplot2)
Question 1
The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population
Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?
Code
# mean mean()Mean1<-19Mean2<-18# standard deviation sd()sd1<-10sd2<-9# sample sizen1<-539n2<-847# standard error sd/sqrt(n)se1<-sd1/sqrt(n1)se2<-sd2/sqrt(n2)se1
A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.
Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per semester for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range (in other words, you can assume they know the population standard deviation).Assuming the significance level to be 5%, what should be the size of the sample
According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90 A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
Jones and Smith separately conduct studies to test H0: μ = 500 against Ha: μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0. A. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.
B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”
Jones is not significant but Smith’s is
C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0,” without reporting the actual P-value.
It is important to have a significant level at 0.05 and report the accurate and true p-value to evaluate the siginifance.
Question6.
A school nurse wants to determine whether age is a factor in whether children choose a healthy snack after school. She conducts a survey of 300 middle school students, with the results below. Test at α = 0.05 the claim that the proportion who choose a healthy snack differs by grade level. What is the null hypothesis? Which test should we use? What is the conclusion? Grade level 6th grade 7th grade 8th grade Healthy snack 31 43 51 Unhealthy snack 69 57 49 hypothesis:there is a relationship between chosing snacks and grade
Pearson's Chi-squared test
data: snack_data$snack and snack_data$grade_level
X-squared = 8.3383, df = 2, p-value = 0.01547
p value smaller than 0.05 means there is a relationship between grade and snack choice.
Question7.
Per-pupil costs (in thousands of dollars) for cyber charter school tuition for school districts in three areas are shown. Test the claim that there is a difference in means for the three areas, using an appropriate test. What is the null hypothesis? Which test should we use? What is the conclusion? Area 1 6.2 9.3 6.8 6.1 6.7 7.5 Area 2 7.5 8.2 8.5 8.2 7.0 9.3 Area 3 5.8 6.4 5.6 7.1 3.0 3.5
Df Sum Sq Mean Sq F value Pr(>F)
Area 2 25.66 12.832 8.176 0.00397 **
Residuals 15 23.54 1.569
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Source Code
---title: "Homework 2"author: "Xiaoyan"description: "Template of course blog qmd file"date: "03/24/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw2 - desriptive statistics - probability---```{r}library(tidyr)library(dplyr)library(readxl)library(ggplot2)```# Question 11. The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times DataGuide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?```{r}# mean mean()Mean1<-19Mean2<-18# standard deviation sd()sd1<-10sd2<-9# sample sizen1<-539n2<-847# standard error sd/sqrt(n)se1<-sd1/sqrt(n1)se2<-sd2/sqrt(n2)se1se2# t-value# tail_area<-(1-confidence_level)/2TA<-(1-0.9)/2#t score <-qt(p=1-tail_area,df=s_size-1 )tscore1<-qt(p=1-TA, df = n1-1)tscore2<-qt(p=1-TA, df = n2-1)tscore1tscore2#CI <-c(s_mean-t_score*SE,s_mean+t_score*SE )CI1<-c(Mean1-tscore1*se1, Mean1+tscore1*se1)CI2<-c(Mean2-tscore2*se2, Mean2+tscore2*se2)CI1CI2DiffCI1<-18.29029-19.70971DiffCI2<-17.49078-18.50922DiffCI1DiffCI2```# Question 2A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success.Construct and interpret a 95% confidence interval for p.```{r, echo=T}#p p1<-567/1031#sample size n<-1031#alpha = 0.05se<-sqrt((p1*(1-p1))/n)#z score1.96CI<-c(p1-1.96*se,p1+1.96*se)CI```# Question 3Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per semester for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range (in other words, you can assume they know the population standard deviation).Assuming the significance level to be 5%, what should be the size of the sample```{r, echo=T}#z*sd/sqrt(n)=5#population sd = (200-30)/4 = 42.5#alpha = 0.05#n=(z*sd/5)^2#SD = 5n<-(1.96*42.5/5)^2nqnorm(0.975)```# Question 4According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.```{r}#assumptions#hypothesis#h0:u=500#ha:u≠500#t-testu4<-500y4<-410sd4<-90n4<-9t4<-(y4-u4)/(sd4/sqrt(n4))t4#pvaluep4=2*pt(t4, df = n4-1)p4#P3<- P-value = P(|t| > |t3|) = P(t < -t3) + P(t > t3)```reject the null hypothesisB. Report the P-value for Ha: μ < 500. Interpret.B.```{r}#hypothesis#h0:u>=500#ha:u<500p4.2<-pt(t4, df = n4-1)p4.2```reject haC. Report and interpret the P-value for Ha: μ > 500.(Hint: The P-values for the two possible one-sided tests must sum to 1.```{r}#hypothesis#h0:u<=500#ha:u>500p4.3<-pt(t4, df = n4-1)1-p4.3```# Question5.Jones and Smith separately conduct studies to test H0: μ = 500 against Ha: μ ≠ 500, eachwith n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7,with se = 10.0.A. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049for Smith.```{r}#H0: μ = 500, Ha: μ ≠ 500u5<-500n5<-1000n5<-1000y51<-519.5se51<-10y52<-519.7se52<-10t51<-(y51-u5)/se51t51t52<-(y52-u5)/se52t52p51<-2*pt(t51, df=n5-1,lower.tail = F)p52<-2*pt(t52, df=n5-1,lower.tail = F)p51p52```why lower tail is false here?B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”Jones is not significant but Smith's isC. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0,” without reporting the actualP-value.It is important to have a significant level at 0.05 and report the accurate and true p-value to evaluate the siginifance. # Question6.A school nurse wants to determine whether age is a factor in whether children choose ahealthy snack after school. She conducts a survey of 300 middle school students, with the resultsbelow. Test at α = 0.05 the claim that the proportion who choose a healthy snack differs by gradelevel. What is the null hypothesis? Which test should we use? What is the conclusion?Grade level 6th grade 7th grade 8th gradeHealthy snack 31 43 51Unhealthy snack 69 57 49hypothesis:there is a relationship between chosing snacks and grade```{r}grade_level <-c(rep("6th grade", 100), rep("7th grade", 100), rep("8th grade", 100))snack <-c(rep("healthy snack", 31), rep("unhealthy snack", 69), rep("healthy snack", 43),rep("unhealthy snack", 57), rep("healthy snack", 51), rep("unhealthy snack", 49))snack_data <-data.frame(grade_level, snack)snack_datachisq.test(snack_data$snack,snack_data$grade_level,correct =FALSE)```p value smaller than 0.05 means there is a relationship between grade and snack choice.# Question7.Per-pupil costs (in thousands of dollars) for cyber charter school tuition for schooldistricts in three areas are shown. Test the claim that there is a difference in means for the threeareas, using an appropriate test. What is the null hypothesis? Which test should we use? What isthe conclusion?Area 1 6.2 9.3 6.8 6.1 6.7 7.5Area 2 7.5 8.2 8.5 8.2 7.0 9.3Area 3 5.8 6.4 5.6 7.1 3.0 3.5```{r}Area <-c(rep("Area1", 6), rep("Area2", 6), rep("Area3", 6))cost <-c(6.2, 9.3, 6.8, 6.1, 6.7, 7.5, 7.5, 8.2, 8.5, 8.2, 7.0, 9.3,5.8, 6.4, 5.6, 7.1, 3.0, 3.5)Area_cost <-data.frame(Area,cost)Area_costsummary(aov(cost~Area, data=Area_cost))```