hw1
Roy Yoon
HW
Author

Roy Yoon

Published

December 20, 2022

Code
library(readxl)
library(ggplot2)
  1. Use the LungCapData to answer the following questions. (Hint: Using dplyr, especially group_by() and summarize() can help you answer the following questions relatively efficiently.)
Code
lung<- read_xls("~/Documents/603/603_Fall_2022/posts/_data/LungCapData.xls")
Error: `path` does not exist: '~/Documents/603/603_Fall_2022/posts/_data/LungCapData.xls'
Code
lung
Error in eval(expr, envir, enclos): object 'lung' not found
  1. What does the distribution of LungCap look like? (Hint: Plot a histogram with probability density on the y axis)
Code
hist(lung$LungCap)
Error in hist(lung$LungCap): object 'lung' not found

The data presents virtually as a normal distribution

  1. Compare the probability distribution of the LungCap with respect to Males and Females? (Hint: make boxplots separated by gender using the boxplot() function)
Code
#Box plot

ggplot(lung, aes(Gender,LungCap)) +
  geom_boxplot()
Error in ggplot(lung, aes(Gender, LungCap)): object 'lung' not found

Male mean is slightly higher than the female mean. However, the means can statisticadlly potentially intersect

  1. Compare the mean lung capacities for smokers and non-smokers. Does it make sense?
Code
lung %>% 
  group_by(Smoke) %>% 
  summarise(average_lung = mean(LungCap))
Error in lung %>% group_by(Smoke) %>% summarise(average_lung = mean(LungCap)): could not find function "%>%"

Matters of the heart are not black and white. However, it is confusing to understand that in this data smokers have a higher lung capcity. I wonder about moderating and mediating effects.

  1. Examine the relationship between Smoking and Lung Capacity within age groups: “less than or equal to 13”, “14 to 15”, “16 to 17”, and “greater than or equal to 18”.
Code
#categorical variable of age_groups 
a<- lung %>% 
group_by(Smoke, LungCap) %>% 
  summarise(age_group = case_when(Age<=13 ~ "13 & under",
                                  Age ==14 | Age == 15 ~ "14 to 15",
                                  Age== 16 | Age == 17 ~ "16 to 17",
                                  Age>=18 ~ "18 & older")) 
Error in lung %>% group_by(Smoke, LungCap) %>% summarise(age_group = case_when(Age <= : could not find function "%>%"
Code
a
Error in eval(expr, envir, enclos): object 'a' not found
Code
#mean of lung capacity with new variable
a %>% 
  group_by(Smoke, age_group) %>% 
  summarise(avg_lung_cap = mean(LungCap)) %>% 
  arrange(desc(avg_lung_cap))
Error in a %>% group_by(Smoke, age_group) %>% summarise(avg_lung_cap = mean(LungCap)) %>% : could not find function "%>%"
Code
#histogram
ggplot(a, aes(x = LungCap)) + facet_grid(Smoke ~age_group) +geom_histogram()
Error in ggplot(a, aes(x = LungCap)): object 'a' not found
  1. Compare the lung capacities for smokers and non-smokers within each age group. Is your answer different from the one in part c. What could possibly be going on here?
Code
# Mean of non-smokers 13 & younger
a %>% 
  filter(Smoke == 'no' & age_group == '13 & under') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "no" & age_group == "13 & under") %>% pull(LungCap) %>% : could not find function "%>%"
Code
# Mean of smokers 13 & younger
a %>% 
  filter(Smoke == 'yes' & age_group == '13 &under') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "yes" & age_group == "13 &under") %>% pull(LungCap) %>% : could not find function "%>%"
Code
#Mean of non-smokers 14 to 15
a %>% 
  filter(Smoke == 'no' & age_group == '14 to 15') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "no" & age_group == "14 to 15") %>% pull(LungCap) %>% : could not find function "%>%"
Code
#Mean of smokers 14 to 15
a %>% 
  filter(Smoke == 'yes' & age_group == '14 to 15') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "yes" & age_group == "14 to 15") %>% pull(LungCap) %>% : could not find function "%>%"
Code
#Mean of non-smokers 16 to 17
a %>% 
  filter(Smoke == 'no' & age_group == '16 to 17') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "no" & age_group == "16 to 17") %>% pull(LungCap) %>% : could not find function "%>%"
Code
#Mean of smokers 16 to 17
a %>% 
  filter(Smoke == 'yes' & age_group == '16 to 17') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "yes" & age_group == "16 to 17") %>% pull(LungCap) %>% : could not find function "%>%"
Code
 #Mean of non-smokers 18 & older
a %>% 
  filter(Smoke == 'no' & age_group == '18 & older') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "no" & age_group == "18 & older") %>% pull(LungCap) %>% : could not find function "%>%"
Code
# Mean of smokers 18 & older
a %>% 
  filter(Smoke == 'yes' & age_group == '18 & older') %>% 
  pull(LungCap) %>% 
  mean()
Error in a %>% filter(Smoke == "yes" & age_group == "18 & older") %>% : could not find function "%>%"
  1. Calculate the correlation and covariance between Lung Capacity and Age. (use the cov() and cor() functions in R). Interpret your results.
Code
#Correlation
cor(lung$LungCap, lung$Age)
Error in is.data.frame(y): object 'lung' not found
Code
#Covariance
cov(lung$LungCap, lung$Age)
Error in is.data.frame(y): object 'lung' not found

A correlation greater than.7 denotes a sttong correlation between LungCap and Age. A high covariance indicates that there is a positive relationship with LungCap and Age.

Question 2

Code
# x= prior convictions 
x<-c(0, 1, 2, 3, 4)
frequency<-c(128, 434, 160, 64, 24)
prison<- tibble(x,frequency)
Error in tibble(x, frequency): could not find function "tibble"
Code
prison
Error in eval(expr, envir, enclos): object 'prison' not found

What is the probability that a randomly selected inmate has exactly 2 prior convictions?

Code
160/810
[1] 0.1975309

0.19 probability that a randomly selected inmate has fewer than 2 priorconvictions

  1. What is the probability that a randomly selected inmate has fewer than 2 priorconvictions?
Code
(128+434)/810
[1] 0.6938272

0.69 that a randomly selected inmate has fewer than 2 priorconvictions

  1. What is the probability that a randomly selected inmate has 2 or fewer prior convictions?
Code
(128+434+160)/810
[1] 0.891358

0.89 that a randomly selected inmate has 2 or fewer prior convictions

  1. What is the probability that a randomly selected inmate has more than 2 prior convictions?
Code
(64+24)/810
[1] 0.108642

0.10 that a randomly selected inmate has more than 2 prior convictions

  1. What is the expected value for the number of prior convictions?
Code
g<-((128*0/810) +(434*1/810) +(160*2/810) +(64*3/810) +(24*4/810)) 
mean(g)
[1] 1.28642

the expected value for the number of prior convictions is 1.28

  1. Calculate the variance and the standard deviation for the Prior Convictions.
Code
v<- ((0-1.28)^2) *(128/810) +((1-1.28)^2) * (434/810)+((2-1.28)^2) * (160/810)+((3-1.28)^2) *(64/810) +((4-1.28)^2) * (24/810)

v
[1] 0.8562765
Code
sd<- sqrt(v)
sd
[1] 0.9253521

The variance is 0.856 and the standard deviation is 0.925.