hw2
Ken Docekal
Author

Ken Docekal

Published

October 17, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Q1

90% Confidence Interval for Bypass:

18.29 - 19.71

Code
n <- 539
xbar <- 19 
s <- 10

margin <- qt(0.95,df=n-1)*s/sqrt(n)

low <- xbar - margin
low
[1] 18.29029
Code
high <- xbar + margin
high
[1] 19.70971

90% Confidence Interval for Angiography:

17.49 - 18.51

Code
n <- 847
xbar <- 18 
s <- 9

margin <- qt(0.95,df=n-1)*s/sqrt(n)

low <- xbar - margin
low
[1] 17.49078
Code
high <- xbar + margin
high
[1] 18.50922

The confidence interval is narrower for Angiography - 1.02 difference, compared to Bypass - 1.42 difference.

Q2

The proportion point estimate for adult Americans who believe that a college education is essential for success is .55, based on 567 out of the representative sample of 1031 adult Americans surveyed.

A 95% confidence interval shows that in 95% of cases the observed mean proportion of adult Americans who believe that a college education is essential for success will be between 52% and 58%.

Code
prop.test(567,1031)

    1-sample proportions test with continuity correction

data:  567 out of 1031, null probability 0.5
X-squared = 10.091, df = 1, p-value = 0.00149
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.5189682 0.5805580
sample estimates:
        p 
0.5499515 

Q3

Based on the range of 30 to 200 we can determine the standard deviation using s = (Maximum – Minimum)/4 resulting in s=42.5.With a 95% significance level we will use 1.96 for the z score. To find the minimum sample size needed - n, we solve for (Zscore*s/margin of error)^2. Our sample size needs to be at least 277.56.

Code
n <- ((1.96)*(42.5)/5)^2

n
[1] 277.5556

Q4

A

Create random normal distribution sample as a tibble for female employee wages using mean - 410, standard deviation - 90, and n - 9. Name column as “wage”.

Code
set.seed(24)
female <-tibble(
  value=rnorm(n=9,mean=410,sd=90))


names(female) <- c("wage")

                   
summary(female) 
      wage      
 Min.   :333.6  
 1st Qu.:360.9  
 Median :433.9  
 Mean   :410.7  
 3rd Qu.:450.0  
 Max.   :486.3  

Using a one sample t-test we observe a p-value of .001 meaning we can reject the null hypothesis where mean income equals 500 at the 95% confidence level. In this test Wages are assumed to be normally distributed; we observe the test statistic of -4.84, which measures how close our data matches the null hypothesis’ expected distribution and informs the p-value.

Results indicate that 95% of cases will not show an employee with 500 in wages. Furthermore, the mean of 410.70 and 95% confidence interval of 368.15 to 453.25 indicate wages will not be greater than 471 at most for most female employees.

Code
t.test(female$wage, mu = 500)

    One Sample t-test

data:  female$wage
t = -4.84, df = 8, p-value = 0.001288
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
 368.1516 453.2464
sample estimates:
mean of x 
  410.699 

B

Due to the small p-value of 0.0006 we are confident that the true mean must be less than 500, even at the 99% confidence level.

Code
t.test(female$wage, mu = 500, alternative = 'less')

    One Sample t-test

data:  female$wage
t = -4.84, df = 8, p-value = 0.0006441
alternative hypothesis: true mean is less than 500
95 percent confidence interval:
    -Inf 445.009
sample estimates:
mean of x 
  410.699 

C

The opposite is true when looking at the p-value of .999 for when the true mean is greater than 500. This indicates that 99.9% of cases observed will not have a mean greater than 500. All results indicate strong likelihood that mean wage will be less than 500 for female employees.

Code
t.test(female$wage, mu = 500, alternative = 'greater')

    One Sample t-test

data:  female$wage
t = -4.84, df = 8, p-value = 0.9994
alternative hypothesis: true mean is greater than 500
95 percent confidence interval:
 376.389     Inf
sample estimates:
mean of x 
  410.699 

Q5

A

First obtain standard deviation using standard error times square root of sample size.

Code
sd <- 10*(sqrt(1000))

sd
[1] 316.2278

We can then create samples as tibbles for Jones and Smith using respective means - 519.7 and 519.5, standard deviation - 316.23, and n - 1000.

Code
set.seed(65)
jones <-tibble(
  value= rnorm(n=1000,mean=519.7,sd=316.23))
Code
set.seed(32)
smith <-tibble(
  value= rnorm(n=1000,mean=519.5,sd=316.23))

With two one sample t-tests we observe a t and p values of 1.859 and .063 for Jones and 2.015 and .044 for Smith. As the data set was randomly generated based on provided parameters results did not exactly match those in the prompt.

Code
t.test(jones$value, mu = 500)

    One Sample t-test

data:  jones$value
t = 1.8587, df = 999, p-value = 0.06336
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
 498.9088 540.2335
sample estimates:
mean of x 
 519.5711 
Code
t.test(smith$value, mu = 500)

    One Sample t-test

data:  smith$value
t = 2.0152, df = 999, p-value = 0.04415
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
 500.5197 539.1257
sample estimates:
mean of x 
 519.8227 

B

Using a 95% confidence interval, p-values indicate that Jones’ test is not statistically significant while Smith’s test is significant. This is because Jones’ p-value is .013 over the acceptable p-value score for results to be acceptable at the 95% confidence level. Smith’s score is .006 under the same threshold and therefore still considered statistically significant.

C

This example illustrates the importance of reporting actual p-values as sometimes the margin of difference between p-values can be minor but still have a major impact on whether results are considered statistically significant. This can mislead readers as test results may be improperly discounted, due to the impact of minor differences in data, although results may otherwise be almost identical to a test done using a different comparable but non-identical sample. View the actual p-value allows readers to better understand the data studied by view the degree of difference results show from the confidence level used.

Q6

Create a data frame with a column for tax values.

Code
gas_taxes <-  data.frame (first_column  = c(51.27, 47.43, 38.89, 41.95, 28.61, 41.29, 52.19, 49.48, 35.02, 48.13, 39.28, 54.41, 41.66, 30.28, 18.49, 38.72, 33.41, 45.02)) 
Code
gas_taxes
   first_column
1         51.27
2         47.43
3         38.89
4         41.95
5         28.61
6         41.29
7         52.19
8         49.48
9         35.02
10        48.13
11        39.28
12        54.41
13        41.66
14        30.28
15        18.49
16        38.72
17        33.41
18        45.02
Code
names(gas_taxes) <- c("tax")

Using a one sample t-test where null hypothesis is mean tax is 45 and alternative hypothesis is true mean is less than 45 we are able to reject the null hypothesis at the 95% confidence level. Results indicate that mean tax was 40.86 and 95% of all observations will find a mean tax less than 44.68; therefore, while a few cities at the upper end of the range had prices near 45 cents per gallon, this was not usual and the average tax per gallon of gas in the US in 2005 was less than 45 cents.

Code
t.test(gas_taxes$tax, mu = 45, alternative = 'less')

    One Sample t-test

data:  gas_taxes$tax
t = -1.8857, df = 17, p-value = 0.03827
alternative hypothesis: true mean is less than 45
95 percent confidence interval:
     -Inf 44.67946
sample estimates:
mean of x 
 40.86278