hw2
desriptive statistics
probability
Template of course blog qmd file
Author

Xiaoyan

Published

March 24, 2023

Code
library(tidyr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
library(readxl)
library(ggplot2)

Question 1

  1. The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population

Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?

Code
# mean mean()
Mean1<-19
Mean2<-18
# standard deviation sd()
sd1<-10
sd2<-9
# sample size
n1<-539
n2<-847
# standard error sd/sqrt(n)
se1<-sd1/sqrt(n1)
se2<-sd2/sqrt(n2)
se1
[1] 0.4307305
Code
se2
[1] 0.3092437
Code
# t-value

# tail_area<-(1-confidence_level)/2
TA<-(1-0.9)/2
#t score <-qt(p=1-tail_area,df=s_size-1 )
tscore1<-qt(p=1-TA, df = n1-1)
tscore2<-qt(p=1-TA, df = n2-1)
tscore1
[1] 1.647691
Code
tscore2
[1] 1.646657
Code
#CI <-c(s_mean-t_score*SE,s_mean+t_score*SE )
CI1<-c(Mean1-tscore1*se1, Mean1+tscore1*se1)
CI2<-c(Mean2-tscore2*se2, Mean2+tscore2*se2)
CI1
[1] 18.29029 19.70971
Code
CI2
[1] 17.49078 18.50922
Code
DiffCI1<-18.29029- 19.70971
DiffCI2<-17.49078-18.50922
DiffCI1
[1] -1.41942
Code
DiffCI2
[1] -1.01844

Question 2

A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.

Code
#p 
p1<-567/1031
#sample size 
n<-1031
#alpha = 0.05
se<-sqrt((p1*(1-p1))/n)
#z score1.96
CI<-c(p1-1.96*se,p1+1.96*se)
CI
[1] 0.5195833 0.5803197

Question 3

Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per semester for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range (in other words, you can assume they know the population standard deviation).Assuming the significance level to be 5%, what should be the size of the sample

Code
#z*sd/sqrt(n)=5
#population sd = (200-30)/4 = 42.5
#alpha = 0.05
#n=(z*sd/5)^2
#SD = 5
n<-(1.96*42.5/5)^2

n
[1] 277.5556
Code
qnorm(0.975)
[1] 1.959964

Question 4

According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90 A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.

Code
#assumptions
#hypothesis
  #h0:u=500
  #ha:u≠500
#t-test
u4<-500
y4<-410
sd4<-90
n4<-9

t4<-(y4-u4)/(sd4/sqrt(n4))
t4
[1] -3
Code
#pvalue
p4= 2*pt(t4, df = n4-1)
p4
[1] 0.01707168
Code
#P3<- P-value = P(|t| > |t3|) = P(t < -t3) + P(t > t3)

reject the null hypothesis

B. Report the P-value for Ha: μ < 500. Interpret. B.

Code
#hypothesis
  #h0:u>=500
  #ha:u<500
p4.2<-pt(t4, df = n4-1)
p4.2
[1] 0.008535841

reject ha C. Report and interpret the P-value for Ha: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.

Code
#hypothesis
  #h0:u<=500
  #ha:u>500
p4.3<-pt(t4, df = n4-1)
1-p4.3
[1] 0.9914642

Question5.

Jones and Smith separately conduct studies to test H0: μ = 500 against Ha: μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0. A. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.

Code
#H0: μ = 500, Ha: μ ≠ 500
u5<-500
n5<-1000
n5<-1000
y51<-519.5
se51<-10
y52<-519.7
se52<-10

t51<-(y51-u5)/se51
t51
[1] 1.95
Code
t52<-(y52-u5)/se52
t52
[1] 1.97
Code
p51<-2*pt(t51, df=n5-1,lower.tail = F)
p52<-2*pt(t52, df=n5-1,lower.tail = F)
p51
[1] 0.05145555
Code
p52
[1] 0.04911426

why lower tail is false here?

B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”

Jones is not significant but Smith’s is

C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0,” without reporting the actual P-value.

It is important to have a significant level at 0.05 and report the accurate and true p-value to evaluate the siginifance.

Question6.

A school nurse wants to determine whether age is a factor in whether children choose a healthy snack after school. She conducts a survey of 300 middle school students, with the results below. Test at α = 0.05 the claim that the proportion who choose a healthy snack differs by grade level. What is the null hypothesis? Which test should we use? What is the conclusion? Grade level 6th grade 7th grade 8th grade Healthy snack 31 43 51 Unhealthy snack 69 57 49 hypothesis:there is a relationship between chosing snacks and grade

Code
grade_level <- c(rep("6th grade", 100), rep("7th grade", 100), rep("8th grade", 100))
snack <- c(rep("healthy snack", 31), rep("unhealthy snack", 69), rep("healthy snack", 43),
           rep("unhealthy snack", 57), rep("healthy snack", 51), rep("unhealthy snack", 49))
snack_data <- data.frame(grade_level, snack)
snack_data
    grade_level           snack
1     6th grade   healthy snack
2     6th grade   healthy snack
3     6th grade   healthy snack
4     6th grade   healthy snack
5     6th grade   healthy snack
6     6th grade   healthy snack
7     6th grade   healthy snack
8     6th grade   healthy snack
9     6th grade   healthy snack
10    6th grade   healthy snack
11    6th grade   healthy snack
12    6th grade   healthy snack
13    6th grade   healthy snack
14    6th grade   healthy snack
15    6th grade   healthy snack
16    6th grade   healthy snack
17    6th grade   healthy snack
18    6th grade   healthy snack
19    6th grade   healthy snack
20    6th grade   healthy snack
21    6th grade   healthy snack
22    6th grade   healthy snack
23    6th grade   healthy snack
24    6th grade   healthy snack
25    6th grade   healthy snack
26    6th grade   healthy snack
27    6th grade   healthy snack
28    6th grade   healthy snack
29    6th grade   healthy snack
30    6th grade   healthy snack
31    6th grade   healthy snack
32    6th grade unhealthy snack
33    6th grade unhealthy snack
34    6th grade unhealthy snack
35    6th grade unhealthy snack
36    6th grade unhealthy snack
37    6th grade unhealthy snack
38    6th grade unhealthy snack
39    6th grade unhealthy snack
40    6th grade unhealthy snack
41    6th grade unhealthy snack
42    6th grade unhealthy snack
43    6th grade unhealthy snack
44    6th grade unhealthy snack
45    6th grade unhealthy snack
46    6th grade unhealthy snack
47    6th grade unhealthy snack
48    6th grade unhealthy snack
49    6th grade unhealthy snack
50    6th grade unhealthy snack
51    6th grade unhealthy snack
52    6th grade unhealthy snack
53    6th grade unhealthy snack
54    6th grade unhealthy snack
55    6th grade unhealthy snack
56    6th grade unhealthy snack
57    6th grade unhealthy snack
58    6th grade unhealthy snack
59    6th grade unhealthy snack
60    6th grade unhealthy snack
61    6th grade unhealthy snack
62    6th grade unhealthy snack
63    6th grade unhealthy snack
64    6th grade unhealthy snack
65    6th grade unhealthy snack
66    6th grade unhealthy snack
67    6th grade unhealthy snack
68    6th grade unhealthy snack
69    6th grade unhealthy snack
70    6th grade unhealthy snack
71    6th grade unhealthy snack
72    6th grade unhealthy snack
73    6th grade unhealthy snack
74    6th grade unhealthy snack
75    6th grade unhealthy snack
76    6th grade unhealthy snack
77    6th grade unhealthy snack
78    6th grade unhealthy snack
79    6th grade unhealthy snack
80    6th grade unhealthy snack
81    6th grade unhealthy snack
82    6th grade unhealthy snack
83    6th grade unhealthy snack
84    6th grade unhealthy snack
85    6th grade unhealthy snack
86    6th grade unhealthy snack
87    6th grade unhealthy snack
88    6th grade unhealthy snack
89    6th grade unhealthy snack
90    6th grade unhealthy snack
91    6th grade unhealthy snack
92    6th grade unhealthy snack
93    6th grade unhealthy snack
94    6th grade unhealthy snack
95    6th grade unhealthy snack
96    6th grade unhealthy snack
97    6th grade unhealthy snack
98    6th grade unhealthy snack
99    6th grade unhealthy snack
100   6th grade unhealthy snack
101   7th grade   healthy snack
102   7th grade   healthy snack
103   7th grade   healthy snack
104   7th grade   healthy snack
105   7th grade   healthy snack
106   7th grade   healthy snack
107   7th grade   healthy snack
108   7th grade   healthy snack
109   7th grade   healthy snack
110   7th grade   healthy snack
111   7th grade   healthy snack
112   7th grade   healthy snack
113   7th grade   healthy snack
114   7th grade   healthy snack
115   7th grade   healthy snack
116   7th grade   healthy snack
117   7th grade   healthy snack
118   7th grade   healthy snack
119   7th grade   healthy snack
120   7th grade   healthy snack
121   7th grade   healthy snack
122   7th grade   healthy snack
123   7th grade   healthy snack
124   7th grade   healthy snack
125   7th grade   healthy snack
126   7th grade   healthy snack
127   7th grade   healthy snack
128   7th grade   healthy snack
129   7th grade   healthy snack
130   7th grade   healthy snack
131   7th grade   healthy snack
132   7th grade   healthy snack
133   7th grade   healthy snack
134   7th grade   healthy snack
135   7th grade   healthy snack
136   7th grade   healthy snack
137   7th grade   healthy snack
138   7th grade   healthy snack
139   7th grade   healthy snack
140   7th grade   healthy snack
141   7th grade   healthy snack
142   7th grade   healthy snack
143   7th grade   healthy snack
144   7th grade unhealthy snack
145   7th grade unhealthy snack
146   7th grade unhealthy snack
147   7th grade unhealthy snack
148   7th grade unhealthy snack
149   7th grade unhealthy snack
150   7th grade unhealthy snack
151   7th grade unhealthy snack
152   7th grade unhealthy snack
153   7th grade unhealthy snack
154   7th grade unhealthy snack
155   7th grade unhealthy snack
156   7th grade unhealthy snack
157   7th grade unhealthy snack
158   7th grade unhealthy snack
159   7th grade unhealthy snack
160   7th grade unhealthy snack
161   7th grade unhealthy snack
162   7th grade unhealthy snack
163   7th grade unhealthy snack
164   7th grade unhealthy snack
165   7th grade unhealthy snack
166   7th grade unhealthy snack
167   7th grade unhealthy snack
168   7th grade unhealthy snack
169   7th grade unhealthy snack
170   7th grade unhealthy snack
171   7th grade unhealthy snack
172   7th grade unhealthy snack
173   7th grade unhealthy snack
174   7th grade unhealthy snack
175   7th grade unhealthy snack
176   7th grade unhealthy snack
177   7th grade unhealthy snack
178   7th grade unhealthy snack
179   7th grade unhealthy snack
180   7th grade unhealthy snack
181   7th grade unhealthy snack
182   7th grade unhealthy snack
183   7th grade unhealthy snack
184   7th grade unhealthy snack
185   7th grade unhealthy snack
186   7th grade unhealthy snack
187   7th grade unhealthy snack
188   7th grade unhealthy snack
189   7th grade unhealthy snack
190   7th grade unhealthy snack
191   7th grade unhealthy snack
192   7th grade unhealthy snack
193   7th grade unhealthy snack
194   7th grade unhealthy snack
195   7th grade unhealthy snack
196   7th grade unhealthy snack
197   7th grade unhealthy snack
198   7th grade unhealthy snack
199   7th grade unhealthy snack
200   7th grade unhealthy snack
201   8th grade   healthy snack
202   8th grade   healthy snack
203   8th grade   healthy snack
204   8th grade   healthy snack
205   8th grade   healthy snack
206   8th grade   healthy snack
207   8th grade   healthy snack
208   8th grade   healthy snack
209   8th grade   healthy snack
210   8th grade   healthy snack
211   8th grade   healthy snack
212   8th grade   healthy snack
213   8th grade   healthy snack
214   8th grade   healthy snack
215   8th grade   healthy snack
216   8th grade   healthy snack
217   8th grade   healthy snack
218   8th grade   healthy snack
219   8th grade   healthy snack
220   8th grade   healthy snack
221   8th grade   healthy snack
222   8th grade   healthy snack
223   8th grade   healthy snack
224   8th grade   healthy snack
225   8th grade   healthy snack
226   8th grade   healthy snack
227   8th grade   healthy snack
228   8th grade   healthy snack
229   8th grade   healthy snack
230   8th grade   healthy snack
231   8th grade   healthy snack
232   8th grade   healthy snack
233   8th grade   healthy snack
234   8th grade   healthy snack
235   8th grade   healthy snack
236   8th grade   healthy snack
237   8th grade   healthy snack
238   8th grade   healthy snack
239   8th grade   healthy snack
240   8th grade   healthy snack
241   8th grade   healthy snack
242   8th grade   healthy snack
243   8th grade   healthy snack
244   8th grade   healthy snack
245   8th grade   healthy snack
246   8th grade   healthy snack
247   8th grade   healthy snack
248   8th grade   healthy snack
249   8th grade   healthy snack
250   8th grade   healthy snack
251   8th grade   healthy snack
252   8th grade unhealthy snack
253   8th grade unhealthy snack
254   8th grade unhealthy snack
255   8th grade unhealthy snack
256   8th grade unhealthy snack
257   8th grade unhealthy snack
258   8th grade unhealthy snack
259   8th grade unhealthy snack
260   8th grade unhealthy snack
261   8th grade unhealthy snack
262   8th grade unhealthy snack
263   8th grade unhealthy snack
264   8th grade unhealthy snack
265   8th grade unhealthy snack
266   8th grade unhealthy snack
267   8th grade unhealthy snack
268   8th grade unhealthy snack
269   8th grade unhealthy snack
270   8th grade unhealthy snack
271   8th grade unhealthy snack
272   8th grade unhealthy snack
273   8th grade unhealthy snack
274   8th grade unhealthy snack
275   8th grade unhealthy snack
276   8th grade unhealthy snack
277   8th grade unhealthy snack
278   8th grade unhealthy snack
279   8th grade unhealthy snack
280   8th grade unhealthy snack
281   8th grade unhealthy snack
282   8th grade unhealthy snack
283   8th grade unhealthy snack
284   8th grade unhealthy snack
285   8th grade unhealthy snack
286   8th grade unhealthy snack
287   8th grade unhealthy snack
288   8th grade unhealthy snack
289   8th grade unhealthy snack
290   8th grade unhealthy snack
291   8th grade unhealthy snack
292   8th grade unhealthy snack
293   8th grade unhealthy snack
294   8th grade unhealthy snack
295   8th grade unhealthy snack
296   8th grade unhealthy snack
297   8th grade unhealthy snack
298   8th grade unhealthy snack
299   8th grade unhealthy snack
300   8th grade unhealthy snack
Code
chisq.test(snack_data$snack,snack_data$grade_level,correct = FALSE)

    Pearson's Chi-squared test

data:  snack_data$snack and snack_data$grade_level
X-squared = 8.3383, df = 2, p-value = 0.01547

p value smaller than 0.05 means there is a relationship between grade and snack choice.

Question7.

Per-pupil costs (in thousands of dollars) for cyber charter school tuition for school districts in three areas are shown. Test the claim that there is a difference in means for the three areas, using an appropriate test. What is the null hypothesis? Which test should we use? What is the conclusion? Area 1 6.2 9.3 6.8 6.1 6.7 7.5 Area 2 7.5 8.2 8.5 8.2 7.0 9.3 Area 3 5.8 6.4 5.6 7.1 3.0 3.5

Code
Area <- c(rep("Area1", 6), rep("Area2", 6), rep("Area3", 6))
cost <- c(6.2, 9.3, 6.8, 6.1, 6.7, 7.5, 7.5, 8.2, 8.5, 8.2, 7.0, 9.3,
          5.8, 6.4, 5.6, 7.1, 3.0, 3.5)
Area_cost <- data.frame(Area,cost)
Area_cost
    Area cost
1  Area1  6.2
2  Area1  9.3
3  Area1  6.8
4  Area1  6.1
5  Area1  6.7
6  Area1  7.5
7  Area2  7.5
8  Area2  8.2
9  Area2  8.5
10 Area2  8.2
11 Area2  7.0
12 Area2  9.3
13 Area3  5.8
14 Area3  6.4
15 Area3  5.6
16 Area3  7.1
17 Area3  3.0
18 Area3  3.5
Code
summary(aov(cost~Area, data=Area_cost))
            Df Sum Sq Mean Sq F value  Pr(>F)   
Area         2  25.66  12.832   8.176 0.00397 **
Residuals   15  23.54   1.569                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1