Hw 2 by Kristin Abijaoude

Hw2

kristin abijaoude

distribution

probability

HW2

Author

Kristin Abijaoude

Published

March 16, 2023

Code

# load packages
packages <- c("readr", "ggplot2", "caret", "summarytools", "tidyverse", "dplyr", "stats", "pwr")
lapply(packages, require, character.only = TRUE)

Loading required package: readr

Loading required package: ggplot2

Loading required package: caret

Loading required package: lattice

Loading required package: summarytools

Loading required package: tidyverse

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ purrr   1.0.0     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ purrr::lift()   masks caret::lift()
✖ tibble::view()  masks summarytools::view()
Loading required package: pwr

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

[[5]]
[1] TRUE

[[6]]
[1] TRUE

[[7]]
[1] TRUE

[[8]]
[1] TRUE

Question 1

Code

surgical_procedures <- c("bypass","angiography")
sample_size <- c(589, 847)
mean_wait_time <- c(19, 18)
standard_deviation <- c(10,9)

surgery_data <- data.frame(surgical_procedures, sample_size, mean_wait_time, standard_deviation)
surgery_data

  surgical_procedures sample_size mean_wait_time standard_deviation
1              bypass         589             19                 10
2         angiography         847             18                  9

Wait times are in days.

Code

# confidence level for bypass and angiography
conf_level <- 0.9

# standard error for bypass
bypass_se <- 10 / sqrt(589)

# confidence interval for bypass
bypassCI <- 19 + qt(c(0.05, 0.95), 589-1) * bypass_se
bypassCI

[1] 18.32118 19.67882

Code

# standard error for angiography
angio_se <- 9 / sqrt(847)

# confidence interval for angiography
angioCI <- 18 + qt(c(0.05, 0.95), 847-1) * angio_se
angioCI

[1] 17.49078 18.50922

Question 2

Code

# out of 1031 Americans surveyed
p <- 567 / 1031 
# 54% of Americans believe college education is essential for success

# 95% significant level
conf<- 0.95 

# standard error
college_se <- sqrt(p*(1-p)/1031) 

# confidence interval
collegeCI <- p + qnorm(c(0.025, 0.975)) * college_se
collegeCI

[1] 0.5195839 0.5803191

Question 3

Code

# $5 or less within the estimate
est <- 5

# money spent on textbooks varies widely, mostly between $30 and $200
sigma <- (200 - 30) / 4

# significant level 
alpha <- 0.5

# z-alpha
z_alpha <- qnorm(1 - alpha / 2)

# sample size of books
n <- ceiling((z_alpha * sigma / est) ^ 2)
n

[1] 33

Question 4

A

Code

# t test
f_emp <- 410
income <- 500
s <- 90
t <- (f_emp - income) / (s / sqrt(9))
t

[1] -3

B

Code

# degree of freedom
df <- 9 - 1 
# df = 8

# p-value
p_value <- pt(t, df)
p_value

[1] 0.008535841

Code

# significant level
alpha <- 0.05

# to reject or not to reject
if (p_value < alpha/2 || p_value > 1-alpha/2) {
  cat("Reject the null hypothesis")
} else {
  cat("Fail to reject the null hypothesis")
}

Reject the null hypothesis

C

Code

1-p_value

[1] 0.9914642

Code

# fail to reject null hypothesis

Question 5

A

Code

#  T values
t_jones <- (519.5 - 500) / 10 # sample mean = 519.5 - 500 for population mean / sample error of 10.0
t_jones

[1] 1.95

Code

t_smith <- (519.7 - 500) / 10 # sample mean = 519.7 - 500 for population mean / sample error of 10.0
t_smith

[1] 1.97

Code

# p values
p_jones <- 2 * pt(-abs(t_jones), df = 999)
p_jones

[1] 0.05145555

Code

p_smith <- 2 * pt(-abs(t_smith), df = 999)
p_smith

[1] 0.04911426

B

Smith’s result is statistically significant, while Jones’ is not.

C

While the two results are close in variables, one of them is significant while the other is not. That’s why P-value is important in determining whether to reject or fail to reject the hypothesis.

Question 6

Code

healthy <- c(31, 43, 51)
unhealthy <- c(69, 57, 49)

snack <- rbind(healthy, unhealthy)

colnames(snack) <- c("6th grade", "7th grade", "8th grade")
rownames(snack) <- c("healthy", "unhealthy")

snack

          6th grade 7th grade 8th grade
healthy          31        43        51
unhealthy        69        57        49

Code

# α = 0.05

Code

chisq.test(snack, correct = FALSE)


    Pearson's Chi-squared test

data:  snack
X-squared = 8.3383, df = 2, p-value = 0.01547

Since the p-value is smaller than the 0.05 threshold, we can conclude that there is an association between grade and snack choices.

Question 7

Code

tuition <- data.frame(
  area = c(rep("Area_1", 6), rep("Area_2", 6), rep("Area_3", 6)),
  cost = c(6.2, 9.3, 6.8, 6.1, 6.7, 7.5, 7.5, 8.2, 8.5, 8.2, 7.0, 9.3, 5.8, 6.4, 5.6, 7.1, 3.0, 3.5))

tuition

     area cost
1  Area_1  6.2
2  Area_1  9.3
3  Area_1  6.8
4  Area_1  6.1
5  Area_1  6.7
6  Area_1  7.5
7  Area_2  7.5
8  Area_2  8.2
9  Area_2  8.5
10 Area_2  8.2
11 Area_2  7.0
12 Area_2  9.3
13 Area_3  5.8
14 Area_3  6.4
15 Area_3  5.6
16 Area_3  7.1
17 Area_3  3.0
18 Area_3  3.5

Code

tuition_model <- aov(cost ~ area, data = tuition)
summary(tuition_model)

            Df Sum Sq Mean Sq F value  Pr(>F)   
area         2  25.66  12.832   8.176 0.00397 **
Residuals   15  23.54   1.569                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From these results, we can conclude that there is not much statistical significance between the area of a given charter school and the cost of the tuition in said charter schools.