The time between the date a patient was recommended for heart surgery and the surgery date for cardiac patients in Ontario was collected by the Cardiac Care Network (“Wait Times Data Guide,” Ministry of Health and Long-Term Care, Ontario, Canada, 2006). The sample mean and sample standard deviation for wait times (in days) of patients for two cardiac procedures are given in the accompanying table. Assume that the sample is representative of the Ontario population.
Construct the 90% confidence interval to estimate the actual mean wait time for each of the two procedures. Is the confidence interval narrower for angiography or bypass surgery?
The formula for the confidence interval of a sample is
\[
CI =\bar{X} \pm t \cdot \frac{s}{\sqrt n}
\]
Code
#calculating degree of freedom for each sample: data <- data%>%mutate(confidence_level =0.9,tail_area = (1-confidence_level)/2,t_score =qt(p =1-tail_area, df =`Sample.Size`-1),CI.low =`Mean Wait Time`- t_score *`Standard.Deviation`/sqrt(`Sample.Size`), CI.high=`Mean Wait Time`+ t_score *`Standard.Deviation`/sqrt(`Sample.Size`),CI.width =abs(CI.high - CI.low) )data<- data %>%select (Surgical.Procedure, CI.low,CI.high , CI.width, t_score, everything() )knitr::kable(data )
Surgical.Procedure
CI.low
CI.high
CI.width
t_score
Sample.Size
Mean.Wait.Time
Standard.Deviation
confidence_level
tail_area
Bypass
18.29029
19.70971
1.419421
1.647691
539
19
10
0.9
0.05
Angiography
17.49078
18.50922
1.018436
1.646657
847
18
9
0.9
0.05
From the table above I can see that the width of confidence interval for Angiography is smaller than for Bypass.
Alternatvie way to calculate the interval is by simulating the data based on given parameters and running one-sided t.test() :
Code
Angiography.data <-rnorm(n=847, mean =18 , sd =9)# Calculate the sample meanmean.q1 <-mean(Angiography.data )# Adjust the sample to have a mean of exactly 410Angiography.data <- Angiography.data +18- mean.q1t.test(Angiography.data, alternative ="two.sided", conf.level =0.9)
One Sample t-test
data: Angiography.data
t = 59.128, df = 846, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
17.49872 18.50128
sample estimates:
mean of x
18
Code
Bypass.data <-rnorm(n=539, mean =19 , sd =10)# Calculate the sample meanmean.bypass.q1 <-mean(Bypass.data )# Adjust the sample to have a mean of exactly 410Bypass.data <- Bypass.data +19- mean.bypass.q1t.test(Bypass.data , alternative ="two.sided", conf.level =0.9)
One Sample t-test
data: Bypass.data
t = 45.732, df = 538, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
90 percent confidence interval:
18.31544 19.68456
sample estimates:
mean of x
19
Question 2
A survey of 1031 adult Americans was carried out by the National Center for Public Policy. Assume that the sample is representative of adult Americans. Among those surveyed, 567 believed that college education is essential for success. Find the point estimate, p, of the proportion of all adult Americans who believe that a college education is essential for success. Construct and interpret a 95% confidence interval for p.
I will assume, that the survey responses are within binomial distribution ( i.e. respondents had an option to agree or disagree that college education is essential for success).
Point estimate for this sample is 567/1031
Code
point.estimate <-567/1031
Confidence interval = point estimate ± margin of error. To calculate a confidence interval for the p, I will usr the following formula:
\[
CI =\bar{p} \pm Zscore \cdot SE
\]
Code
Sample.proportion <-567/1031Sample.Size <-1031data2<-data.frame(Sample.proportion,Sample.Size )data2<- data2%>%mutate(SE =(sqrt(Sample.proportion*(1-Sample.proportion)/Sample.Size)), # for binomial distributionz_score =1.96, # for 95% cofidence interval CI.low = Sample.proportion - z_score * SE,CI.high= Sample.proportion + z_score * SE )knitr::kable(data2 )
Sample.proportion
Sample.Size
SE
z_score
CI.low
CI.high
0.5499515
1031
0.015494
1.96
0.5195833
0.5803197
This confidence interval can be interpreted as “The people who agreethat college education is essential for success is between 52% and 58%from general American population”.
I could also calculate confidence interval by generating a sample data set and using prob.test() function:
Code
#generating data table: # Generating sample for grade 6: successes <-0 p <-567/1031 n<-1031# repeat until the desired number of successes is obtainedwhile (successes !=567) { responses <-rbinom(n=1031, 1, p=p)# count the number of successes successes <-sum(responses) }respondent.n <-seq(1:1031)dataset.q2 <-data.frame( respondent.n , responses)prop.test(x =sum(dataset.q2$responses), n=n, p =p )
1-sample proportions test without continuity correction
data: sum(dataset.q2$responses) out of n, null probability p
X-squared = 6.9637e-30, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5499515
95 percent confidence interval:
0.5194543 0.5800778
sample estimates:
p
0.5499515
This way of calculating CI provided me with the same interval as the first one.
Question 3
Suppose that the financial aid office of UMass Amherst seeks to estimate the mean cost of textbooks per semester for students. The estimate will be useful if it is within $5 of the true population mean (i.e. they want the confidence interval to have a length of $10 or less). The financial aid office is pretty sure that the amount spent on books varies widely, with most values between $30 and $200. They think that the population standard deviation is about a quarter of this range (in other words, you can assume they know the population standard deviation). Assuming the significance level to be 5%, what should be the size of the sample?
Since I am given standard deviation of population, I can use the following formula for CI for confidence level 95%:
\[
CI =\bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt n}
\] The range of CI will be
\[
CI range = 10=(\bar{X} + 1.96 \cdot \frac{\sigma}{\sqrt n}) -(\bar{X} - 1.96 \cdot \frac{\sigma}{\sqrt n})
\]
To calculate confidence interval width of $10, financial aid office should collect a sample size of 278 students.
Question 4
According to a union agreement, the mean income for all senior-level workers in a large service company equals $500 per week. A representative of a women’s group decides to analyze whether the mean income μ for female employees matches this norm. For a random sample of nine female employees, ȳ = $410 and s = 90 A. Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result. B. Report the P-value for Ha: μ < 500. Interpret. C. Report and interpret the P-value for Ha: μ > 500. (Hint: The P-values for the two possible one-sided tests must sum to 1.
A. Two-sided t.test:
Test whether the mean income of female employees differs from $500 per week. Include assumptions, hypotheses, test statistic, and P-value. Interpret the result.
Assumption: The sample is representative of all senior-level female workers in that company.
Ho: mean of female income is close to the company policy income of $500. H1: mean of female income is not the same as the company policy income of $500.
To test this hypothesis, I will used one-sample t.test, which will compare mean and cinficance interval of a sample to a policy level of $500.
T.test() function in R uses t-statistics to evaluate small sample size (9).
Code
# Generate a sample with a mean close to 410female.sample <-rnorm(n=9, mean=410, sd=90)# Calculate the sample meansample.mean <-mean(female.sample)# Adjust the sample to have a mean of exactly 410female.sample <- female.sample +410- sample.meana.ttest.sample<-t.test(female.sample, alternative ="two.sided", mu=500)a.ttest.sample
One Sample t-test
data: female.sample
t = -2.2966, df = 8, p-value = 0.05074
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
319.6305 500.3695
sample estimates:
mean of x
410
The output shows 95% confidence interval (325 - 494) with the p-value under 0.05. This interval does not include policy value of $500, meaning that the true female income mean at the company is very unlikely to be 500. Therefore I reject Ho, and Accept H1.
B. One-sided, “less”
Report the P-value for Ha: μ < 500. Interpret.
Ho: mean of female income is no less than the company policy income of $500. H1: mean of female income is lower than the company policy income of $500.
Code
b.ttest.sample<-t.test(female.sample, alternative ="less", mu=500)b.ttest.sample
One Sample t-test
data: female.sample
t = -2.2966, df = 8, p-value = 0.02537
alternative hypothesis: true mean is less than 500
95 percent confidence interval:
-Inf 482.8735
sample estimates:
mean of x
410
The confidence interval for this hypothesis is even further from 500, that for two-sided test: (-inf, 455) which does not include 500. I accept alternative hypothesis that true female income is lower than 500. P-value for one-sided test is twice lower than for two-sided , because we are only estimating one direction from the mean.
C. One-sided, “greater”.
Report and interpret the P-value for Ha: μ > 500.
Ho: mean of female income is not greater than the company policy income of $500. H1: mean of female income is greater than the company policy income of $500.
Code
c.ttest.sample<-t.test(female.sample, alternative ="greater", mu=500)c.ttest.sample
One Sample t-test
data: female.sample
t = -2.2966, df = 8, p-value = 0.9746
alternative hypothesis: true mean is greater than 500
95 percent confidence interval:
337.1265 Inf
sample estimates:
mean of x
410
The confidence interval for this hypothesis includes 500, which suppports H0 that the female income is not greater than company policy. Additionally, p-value is greater than 0.05 which also supports Ho. income is less or greater than 500) equals to 1.
Question 5
Jones and Smith separately conduct studies to test H0: μ = 500 against Ha: μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se = 10.0. Smith gets ȳ = 519.7, with se = 10.0.
A. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.
B. Using α = 0.05, for each study indicate whether the result is “statistically significant.”
C. Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0,” without reporting the actual P-value.
*A. Calculating t-statistics for each sample:
Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97 and P-value = 0.049 for Smith.
To do that, we will create a summary table containing both samples/ parameters and calculate t-statistic, critical value and p-value
Code
# Generate a sample with a mean 519.5 for John# John.data1<- rnorm(n=1000, mean =519.5)# John.mean <- mean(John.data1)# John.data1<- John.data1 + 519.5 - John.mean# John.mean <- mean(John.data1)# John.meann<-c(1000, 1000) study.name <-c("John", "Smith")Means <-c(519.5, 519.7) SE <-c(10.0, 10.0) summary <-data.frame(study.name, n, Means, SE)summary <- summary%>%mutate(tail_area = (1-0.95)/2,t_score =round( qt(p =1-tail_area, df = n -1), 2),# Critical t-value CI.low = Means - t_score * SE, CI.high = Means + t_score * SE,test.statistics = ((Means -500)/ SE),p.value = (1-pt(test.statistics, df = n -1)) *2 ) knitr::kable(summary)
study.name
n
Means
SE
tail_area
t_score
CI.low
CI.high
test.statistics
p.value
John
1000
519.5
10
0.025
1.96
499.9
539.1
1.95
0.0514555
Smith
1000
519.7
10
0.025
1.96
500.1
539.3
1.97
0.0491143
I can see from the table above, that the Confidence Interval (CI) for John’s data inludes 500 value, while Somth’s CI doesnt. Also I see that the two datasets result in slightly different test statistics and therefore p-value, which are very close to the tested value. Due to John’s p-value being above 0.05, I will accept HO. However, Smith p-value is below 0.05 and I would decline HO and accept H1.
B. Interpreting statistical significance.
Using α = 0.05, for each study indicate whether the result is “statistically significant.”
We can conclude that the difference between 500 and the mean of Smith’s sample is statistically significant to reject Ho, while John’s sample mean’s difference from 500 is not statistically significant (i.e. not far enough from 500).
C. Potential misleading of comparing p-value to 0.05:
Using this example, explain the misleading aspects of reporting the result of a test as “P ≤ 0.05” versus “P > 0.05,” or as “reject H0” versus “Do not reject H0,” without reporting the actual P-value.
Comparing this two almost identical samples allowed us to demonstrate how a tiny difference in sample mean can produce difference in study outcomes (rejection vs acceptance of Ho). It is important to remember, that the sample means vary due to randomness of sampling process, and therefore produce variety of confidence intervals, test statistics and p-values.Reporting actual p-value will demonstrate its closeness to 0.05 cutoff level, and warn the reader about potential error in conclusion.
Question 6
A school nurse wants to determine whether age is a factor in whether children choose a healthy snack after school. She conducts a survey of 300 middle school students, with the results below. Test at α = 0.05 the claim that the proportion who choose a healthy snack differs by grade level. What is the null hypothesis? Which test should we use? What is the conclusion?
Pearson's Chi-squared test
data: table2$grade and table2$healthy.choice
X-squared = 8.3383, df = 2, p-value = 0.01547
The p-value of 0.01 makes me to reject H0 hypothesis that all grades have equal probability, and accpet H1, that the kids of different grades differ in probability of making haealthy snack choice.
Question 7
Per-pupil costs (in thousands of dollars) for cyber charter school tuition for school districts in three areas are shown. Test the claim that there is a difference in means for the three areas, using an appropriate test. What is the null hypothesis? Which test should we use? What is the conclusion?
Df Sum Sq Mean Sq F value Pr(>F)
Area 2 25.66 12.832 8.176 0.00397 **
Residuals 15 23.54 1.569
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of ANOVA test demonstrate p-value of 0.00397. I make a conclusion that at least one of the three areas’ means is significantly different from others (i.e reject Ho and accept H1).
Source Code
---title: "Homework 2"author: "Diana Rinker"description: "HW2"date: "3/25/2023"format: html: toc: true code-fold: true code-copy: true code-tools: trueeditor: markdown: wrap: 72---# DACSS 603, spring 2023# Homework 2, Diana Rinker.Loading necessary libraries:```{r, echo=T}library(readxl)library(dplyr)library(ggplot2)library(tidyverse)```# Question 1***The time between the date a patient was recommended for heart surgeryand the surgery date for cardiac patients in Ontario was collected bythe Cardiac Care Network ("Wait Times Data Guide," Ministry of Healthand Long-Term Care, Ontario, Canada, 2006). The sample mean and samplestandard deviation for wait times (in days) of patients for two cardiacprocedures are given in the accompanying table. Assume that the sampleis representative of the Ontario population.***```{r, echo=T}`Surgical Procedure`<-c("Bypass", "Angiography" )`Sample Size`<-c(539,847 )`Mean Wait Time`<-c(19, 18)`Standard Deviation`<-c(10, 9)data<-data.frame(`Surgical Procedure`,`Sample Size`, `Mean Wait Time`, `Standard Deviation` )knitr::kable(data )```*Construct the 90% confidence interval to estimate the actual mean waittime for each of the two procedures. Is the confidence interval narrowerfor angiography or bypass surgery?*The formula for the confidence interval of a sample is$$CI =\bar{X} \pm t \cdot \frac{s}{\sqrt n}$$```{r, echo=T}#calculating degree of freedom for each sample: data <- data%>%mutate(confidence_level =0.9,tail_area = (1-confidence_level)/2,t_score =qt(p =1-tail_area, df =`Sample.Size`-1),CI.low =`Mean Wait Time`- t_score *`Standard.Deviation`/sqrt(`Sample.Size`), CI.high=`Mean Wait Time`+ t_score *`Standard.Deviation`/sqrt(`Sample.Size`),CI.width =abs(CI.high - CI.low) )data<- data %>%select (Surgical.Procedure, CI.low,CI.high , CI.width, t_score, everything() )knitr::kable(data ) ```From the table above I can see that the width of confidence interval forAngiography is smaller than for Bypass.Alternatvie way to calculate the interval is by simulating the databased on given parameters and running one-sided t.test() :```{r, echo=T}Angiography.data <-rnorm(n=847, mean =18 , sd =9)# Calculate the sample meanmean.q1 <-mean(Angiography.data )# Adjust the sample to have a mean of exactly 410Angiography.data <- Angiography.data +18- mean.q1t.test(Angiography.data, alternative ="two.sided", conf.level =0.9) ``````{r, echo=T}Bypass.data <-rnorm(n=539, mean =19 , sd =10)# Calculate the sample meanmean.bypass.q1 <-mean(Bypass.data )# Adjust the sample to have a mean of exactly 410Bypass.data <- Bypass.data +19- mean.bypass.q1t.test(Bypass.data , alternative ="two.sided", conf.level =0.9) ```# Question 2***A survey of 1031 adult Americans was carried out by the NationalCenter for Public Policy. Assume that the sample is representative ofadult Americans. Among those surveyed, 567 believed that collegeeducation is essential for success. Find the point estimate, p, of theproportion of all adult Americans who believe that a college educationis essential for success. Construct and interpret a 95% confidenceinterval for p.***I will assume, that the survey responses are within binomialdistribution ( i.e. respondents had an option to agree or disagree thatcollege education is essential for success).Point estimate for this sample is 567/1031```{r, echo=T}point.estimate <-567/1031```Confidence interval = point estimate ± margin of error. To calculate aconfidence interval for the p, I will usr the following formula:$$ CI =\bar{p} \pm Zscore \cdot SE$$```{r, echo=T}Sample.proportion <-567/1031Sample.Size <-1031data2<-data.frame(Sample.proportion,Sample.Size )data2<- data2%>%mutate(SE =(sqrt(Sample.proportion*(1-Sample.proportion)/Sample.Size)), # for binomial distributionz_score =1.96, # for 95% cofidence interval CI.low = Sample.proportion - z_score * SE,CI.high= Sample.proportion + z_score * SE )knitr::kable(data2 ) ```This confidence interval can be interpreted as "The people who agreethat college education is essential for success is between 52% and 58%from general American population".I could also calculate confidence interval by generating a sample data set and using prob.test() function: ```{r, echo=T}#generating data table: # Generating sample for grade 6: successes <-0 p <-567/1031 n<-1031# repeat until the desired number of successes is obtainedwhile (successes !=567) { responses <-rbinom(n=1031, 1, p=p)# count the number of successes successes <-sum(responses) }respondent.n <-seq(1:1031)dataset.q2 <-data.frame( respondent.n , responses)prop.test(x =sum(dataset.q2$responses), n=n, p =p )```This way of calculating CI provided me with the same interval as the first one.# Question 3***Suppose that the financial aid office of UMass Amherst seeks toestimate the mean cost of textbooks per semester for students. Theestimate will be useful if it is within \$5 of the true population mean(i.e. they want the confidence interval to have a length of \$10 orless). The financial aid office is pretty sure that the amount spent onbooks varies widely, with most values between \$30 and \$200. They thinkthat the population standard deviation is about a quarter of this range(in other words, you can assume they know the population standarddeviation). Assuming the significance level to be 5%, what should be thesize of the sample?***Since I am given standard deviation of population, I can use thefollowing formula for CI for confidence level 95%:$$CI =\bar{X} \pm 1.96 \cdot \frac{\sigma}{\sqrt n} $$ The range of CI will be$$CI range = 10=(\bar{X} + 1.96 \cdot \frac{\sigma}{\sqrt n}) -(\bar{X} - 1.96 \cdot \frac{\sigma}{\sqrt n}) $$$$ 2\cdot (1.96 \cdot \frac{\sigma}{\sqrt n}) = 10$$$$ {\sqrt n} = \frac{1.96\sigma}{5}$$$$ n = (\frac{1.96 \cdot 42.5}{5})^2 $$```{r, echo=T}SD.population <- (200-30)/4n<- ((1.96*SD.population)/5)^2knitr::kable(n) ```To calculate confidence interval width of \$10, financial aid officeshould collect a sample size of 278 students.# Question 4***According to a union agreement, the mean income for all senior-levelworkers in a large service company equals \$500 per week. Arepresentative of a women's group decides to analyze whether the meanincome μ for female employees matches this norm. For a random sample ofnine female employees, ȳ = \$410 and s = 90 A. Test whether the meanincome of female employees differs from \$500 per week. Includeassumptions, hypotheses, test statistic, and P-value. Interpret theresult. B. Report the P-value for Ha: μ \< 500. Interpret. C. Report andinterpret the P-value for Ha: μ \> 500. (Hint: The P-values for the twopossible one-sided tests must sum to 1.***#### *A. Two-sided t.test:* ***Test whether the mean income of female employees differs from \$500per week. Include assumptions, hypotheses, test statistic, and P-value.Interpret the result.***Assumption: The sample is representative of all senior-level femaleworkers in that company.Ho: mean of female income is close to the company policy income of\$500. H1: mean of female income is not the same as the company policyincome of \$500.To test this hypothesis, I will used one-sample t.test, which willcompare mean and cinficance interval of a sample to a policy level of\$500.T.test() function in R uses t-statistics to evaluate small sample size(9).```{r, echo=T}# Generate a sample with a mean close to 410female.sample <-rnorm(n=9, mean=410, sd=90)# Calculate the sample meansample.mean <-mean(female.sample)# Adjust the sample to have a mean of exactly 410female.sample <- female.sample +410- sample.meana.ttest.sample<-t.test(female.sample, alternative ="two.sided", mu=500)a.ttest.sample```The output shows 95% confidence interval (325 - 494) with the p-valueunder 0.05. This interval does not include policy value of \$500,meaning that the true female income mean at the company is very unlikelyto be 500. Therefore I reject Ho, and Accept H1.#### *B. One-sided, "less"****Report the P-value for Ha: μ \< 500. Interpret.***Ho: mean of female income is no less than the company policy income of\$500. H1: mean of female income is lower than the company policy incomeof \$500.```{r, echo=T}b.ttest.sample<-t.test(female.sample, alternative ="less", mu=500)b.ttest.sample```The confidence interval for this hypothesis is even further from 500,that for two-sided test: (-inf, 455) which does not include 500. Iaccept alternative hypothesis that true female income is lower than 500.P-value for one-sided test is twice lower than for two-sided , becausewe are only estimating one direction from the mean.#### *C. One-sided, "greater".****Report and interpret the P-value for Ha: μ \> 500.***Ho: mean of female income is not greater than the company policy incomeof \$500. H1: mean of female income is greater than the company policyincome of \$500.```{r, echo=T}c.ttest.sample<-t.test(female.sample, alternative ="greater", mu=500)c.ttest.sample```The confidence interval for this hypothesis includes 500, whichsuppports H0 that the female income is not greater than company policy.Additionally, p-value is greater than 0.05 which also supports Ho.<!-- We can see that the sum if p-values for B and C scenarions ( that female -->income is less or greater than 500) equals to 1.------------------------------------------------------------------------# Question 5***Jones and Smith separately conduct studies to test H0: μ = 500against Ha: μ ≠ 500, each with n = 1000. Jones gets ȳ = 519.5, with se =10.0. Smith gets ȳ = 519.7, with se = 10.0.******A. Show that t = 1.95 and P-value = 0.051 for Jones. Show that t =1.97 and P-value = 0.049 for Smith.******B. Using α = 0.05, for each study indicate whether the result is"statistically significant."******C. Using this example, explain the misleading aspects of reportingthe result of a test as "P ≤ 0.05" versus "P \> 0.05," or as "reject H0"versus "Do not reject H0," without reporting the actual P-value.***#### *A. Calculating t-statistics for each sample:***Show that t = 1.95 and P-value = 0.051 for Jones. Show that t = 1.97and P-value = 0.049 for Smith.***To do that, we will create a summary table containing both samples/parameters and calculate t-statistic, critical value and p-value```{r, echo=T}# Generate a sample with a mean 519.5 for John# John.data1<- rnorm(n=1000, mean =519.5)# John.mean <- mean(John.data1)# John.data1<- John.data1 + 519.5 - John.mean# John.mean <- mean(John.data1)# John.meann<-c(1000, 1000) study.name <-c("John", "Smith")Means <-c(519.5, 519.7) SE <-c(10.0, 10.0) summary <-data.frame(study.name, n, Means, SE)summary <- summary%>%mutate(tail_area = (1-0.95)/2,t_score =round( qt(p =1-tail_area, df = n -1), 2),# Critical t-value CI.low = Means - t_score * SE, CI.high = Means + t_score * SE,test.statistics = ((Means -500)/ SE),p.value = (1-pt(test.statistics, df = n -1)) *2 ) knitr::kable(summary) ```I can see from the table above, that the Confidence Interval (CI) forJohn's data inludes 500 value, while Somth's CI doesnt. Also I see thatthe two datasets result in slightly different test statistics andtherefore p-value, which are very close to the tested value. Due toJohn's p-value being above 0.05, I will accept HO. However, Smithp-value is below 0.05 and I would decline HO and accept H1.#### *B. Interpreting statistical significance.****Using α = 0.05, for each study indicate whether the result is"statistically significant."***We can conclude that the difference between 500 and the mean of Smith'ssample is **statistically significant to reject Ho**, while John'ssample mean's difference from 500 is not statistically significant (i.e.not far enough from 500).#### *C. Potential misleading of comparing p-value to 0.05:****Using this example, explain the misleading aspects of reporting theresult of a test as "P ≤ 0.05" versus "P \> 0.05," or as "reject H0"versus "Do not reject H0," without reporting the actual P-value.***Comparing this two almost identical samples allowed us to demonstratehow a tiny difference in sample mean can produce difference in studyoutcomes (rejection vs acceptance of Ho). It is important to remember,that the sample means vary due to randomness of sampling process, andtherefore produce variety of confidence intervals, test statistics andp-values.Reporting actual p-value will demonstrate its closeness to 0.05cutoff level, and warn the reader about potential error in conclusion.------------------------------------------------------------------------# Question 6***A school nurse wants to determine whether age is a factor in whetherchildren choose a healthy snack after school. She conducts a survey of300 middle school students, with the results below. Test at α = 0.05 theclaim that the proportion who choose a healthy snack differs by gradelevel. What is the null hypothesis? Which test should we use? What isthe conclusion?***```{r, echo=T}`grade level`<-c("6","7","8")`Healthy snack`<-c(31,43,51)`Unhealthy snack`<-c(69,57,49)table <-data.frame (`grade level`, `Healthy snack`, `Unhealthy snack` )# table <- addmargins( table , margin =1:2, FUN = sum)knitr::kable(table) ```This table represents data for two categorical variables: `grade level`and `snack choice`I will generate a sample data as given above:```{r, echo=T}# Generating sample for grade 6: successes <-0# repeat until the desired number of successes is obtainedwhile (successes !=31) { grade6 <-rbinom(n=100, 1, 0.31)# count the number of successes successes <-sum(grade6) }# Generating sample for grade 7: successes <-0while (successes !=43) { grade7 <-rbinom(n=100, 1, 0.43) successes <-sum(grade7) }# Generating sample for grade 8: successes <-0while (successes !=51) { grade8 <-rbinom(n=100, 1, 0.51) successes <-sum(grade8) }# Combining all grades in one table for analysis: table2<-data.frame(grade6, grade7, grade8) table2<- table2 %>%pivot_longer(c(grade6, grade7, grade8 ), names_to="grade", values_to="healthy.choice") xtabs <-xtabs(~table2$grade + table2$healthy.choice) knitr::kable( xtabs ) ```The school nurse collected this data to test the following hypothesis:Ho: all grades' samples are coming from the same general population, andprobability of making healthy choice is equal for all grades.H1: probability of making healthy choice is not equal for all grades.I will use a chi-square test to test this hypothesis:```{r, echo=T} table2$healthy.choice <-as.character (table2$healthy.choice) table2$grade <-as.character(table2$grade)ch.sq<-chisq.test(table2$grade, table2$healthy.choice ,correct =FALSE)ch.sq```The p-value of 0.01 makes me to reject H0 hypothesis that all gradeshave equal probability, and accpet H1, that the kids of different gradesdiffer in probability of making haealthy snack choice.------------------------------------------------------------------------# Question 7Per-pupil costs (in thousands of dollars) for cyber charter schooltuition for school districts in three areas are shown. Test the claimthat there is a difference in means for the three areas, using anappropriate test. What is the null hypothesis? Which test should we use?What is the conclusion?```{r, echo=T}Area.1<-c( 6.2, 9.3, 6.8, 6.1, 6.7, 7.5)Area.2<-c( 7.5, 8.2, 8.5, 8.2, 7.0, 9.3)Area.3<-c( 5.8, 6.4 ,5.6, 7.1, 3.0, 3.5)table.q7 <-data.frame (Area.1, Area.2, Area.3)knitr::kable(table.q7 ) ```Ho: there is no difference in means between the three areas. H1: Atleast one of these areas' means is significantly different.To test this hypopthesis, I will use ANOVA test to compare means ofthree samples:```{r, echo=T} table.q7 <- table.q7 %>%pivot_longer(c(Area.1, Area.2, Area.3 ), names_to="Area", values_to="score")table.q7$Area <-as.factor (table.q7$Area)anova.q7 <-aov( score ~ Area, table.q7 )summary(anova.q7)```The results of ANOVA test demonstrate p-value of 0.00397. I make aconclusion that at least one of the three areas' means is significantlydifferent from others (i.e reject Ho and accept H1).---