Code
library(readxl)
library(ggplot2)
Roy Yoon
December 20, 2022
Error: `path` does not exist: '~/Documents/603/603_Fall_2022/posts/_data/LungCapData.xls'
Error in eval(expr, envir, enclos): object 'lung' not found
The data presents virtually as a normal distribution
Error in ggplot(lung, aes(Gender, LungCap)): object 'lung' not found
Male mean is slightly higher than the female mean. However, the means can statisticadlly potentially intersect
Error in lung %>% group_by(Smoke) %>% summarise(average_lung = mean(LungCap)): could not find function "%>%"
Matters of the heart are not black and white. However, it is confusing to understand that in this data smokers have a higher lung capcity. I wonder about moderating and mediating effects.
Error in lung %>% group_by(Smoke, LungCap) %>% summarise(age_group = case_when(Age <= : could not find function "%>%"
Error in eval(expr, envir, enclos): object 'a' not found
Error in a %>% group_by(Smoke, age_group) %>% summarise(avg_lung_cap = mean(LungCap)) %>% : could not find function "%>%"
Error in ggplot(a, aes(x = LungCap)): object 'a' not found
Error in a %>% filter(Smoke == "no" & age_group == "13 & under") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "yes" & age_group == "13 &under") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "no" & age_group == "14 to 15") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "yes" & age_group == "14 to 15") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "no" & age_group == "16 to 17") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "yes" & age_group == "16 to 17") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "no" & age_group == "18 & older") %>% pull(LungCap) %>% : could not find function "%>%"
Error in a %>% filter(Smoke == "yes" & age_group == "18 & older") %>% : could not find function "%>%"
Error in is.data.frame(y): object 'lung' not found
Error in is.data.frame(y): object 'lung' not found
A correlation greater than.7 denotes a sttong correlation between LungCap and Age. A high covariance indicates that there is a positive relationship with LungCap and Age.
Question 2
Error in tibble(x, frequency): could not find function "tibble"
Error in eval(expr, envir, enclos): object 'prison' not found
What is the probability that a randomly selected inmate has exactly 2 prior convictions?
0.19 probability that a randomly selected inmate has fewer than 2 priorconvictions
0.69 that a randomly selected inmate has fewer than 2 priorconvictions
0.89 that a randomly selected inmate has 2 or fewer prior convictions
0.10 that a randomly selected inmate has more than 2 prior convictions
the expected value for the number of prior convictions is 1.28
[1] 0.8562765
[1] 0.9253521
The variance is 0.856 and the standard deviation is 0.925.
---
title: "HW1_RoyYoon"
author: "Roy Yoon"
editor: visual
desription: "First homework"
date: "12/20/22"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw1
- Roy Yoon
- HW
---
```{r}
library(readxl)
library(ggplot2)
```
1. Use the LungCapData to answer the following questions. (Hint: Using dplyr, especially
group_by() and summarize() can help you answer the following questions relatively
efficiently.)
```{r}
lung<- read_xls("~/Documents/603/603_Fall_2022/posts/_data/LungCapData.xls")
lung
```
a. What does the distribution of LungCap look like? (Hint: Plot a histogram with
probability density on the y axis)
```{r}
hist(lung$LungCap)
```
The data presents virtually as a normal distribution
b. Compare the probability distribution of the LungCap with respect to Males and
Females? (Hint: make boxplots separated by gender using the boxplot() function)
```{r}
#Box plot
ggplot(lung, aes(Gender,LungCap)) +
geom_boxplot()
```
Male mean is slightly higher than the female mean. However, the means can statisticadlly potentially intersect
c. Compare the mean lung capacities for smokers and non-smokers. Does it make sense?
```{r}
lung %>%
group_by(Smoke) %>%
summarise(average_lung = mean(LungCap))
```
Matters of the heart are not black and white. However, it is confusing to understand that in this data smokers have a higher lung capcity. I wonder about moderating and mediating effects.
d. Examine the relationship between Smoking and Lung Capacity within age
groups: “less than or equal to 13”, “14 to 15”, “16 to 17”, and “greater than or equal to 18”.
```{r}
#categorical variable of age_groups
a<- lung %>%
group_by(Smoke, LungCap) %>%
summarise(age_group = case_when(Age<=13 ~ "13 & under",
Age ==14 | Age == 15 ~ "14 to 15",
Age== 16 | Age == 17 ~ "16 to 17",
Age>=18 ~ "18 & older"))
a
#mean of lung capacity with new variable
a %>%
group_by(Smoke, age_group) %>%
summarise(avg_lung_cap = mean(LungCap)) %>%
arrange(desc(avg_lung_cap))
#histogram
ggplot(a, aes(x = LungCap)) + facet_grid(Smoke ~age_group) +geom_histogram()
```
e. Compare the lung capacities for smokers and non-smokers within each age
group. Is your answer different from the one in part c. What could possibly be going on here?
```{r}
# Mean of non-smokers 13 & younger
a %>%
filter(Smoke == 'no' & age_group == '13 & under') %>%
pull(LungCap) %>%
mean()
# Mean of smokers 13 & younger
a %>%
filter(Smoke == 'yes' & age_group == '13 &under') %>%
pull(LungCap) %>%
mean()
#Mean of non-smokers 14 to 15
a %>%
filter(Smoke == 'no' & age_group == '14 to 15') %>%
pull(LungCap) %>%
mean()
#Mean of smokers 14 to 15
a %>%
filter(Smoke == 'yes' & age_group == '14 to 15') %>%
pull(LungCap) %>%
mean()
#Mean of non-smokers 16 to 17
a %>%
filter(Smoke == 'no' & age_group == '16 to 17') %>%
pull(LungCap) %>%
mean()
#Mean of smokers 16 to 17
a %>%
filter(Smoke == 'yes' & age_group == '16 to 17') %>%
pull(LungCap) %>%
mean()
#Mean of non-smokers 18 & older
a %>%
filter(Smoke == 'no' & age_group == '18 & older') %>%
pull(LungCap) %>%
mean()
# Mean of smokers 18 & older
a %>%
filter(Smoke == 'yes' & age_group == '18 & older') %>%
pull(LungCap) %>%
mean()
```
f. Calculate the correlation and covariance between Lung Capacity and Age. (use
the cov() and cor() functions in R). Interpret your results.
```{r}
#Correlation
cor(lung$LungCap, lung$Age)
#Covariance
cov(lung$LungCap, lung$Age)
```
A correlation greater than.7 denotes a sttong correlation between LungCap and Age. A high covariance indicates that there is a positive relationship with LungCap and Age.
Question 2
```{r}
# x= prior convictions
x<-c(0, 1, 2, 3, 4)
frequency<-c(128, 434, 160, 64, 24)
prison<- tibble(x,frequency)
prison
```
What is the probability that a randomly selected inmate has exactly 2 prior convictions?
```{r}
160/810
```
0.19 probability that a randomly selected inmate has fewer than 2 priorconvictions
b. What is the probability that a randomly selected inmate has fewer than 2 priorconvictions?
```{r}
(128+434)/810
```
0.69 that a randomly selected inmate has fewer than 2 priorconvictions
c. What is the probability that a randomly selected inmate has 2 or fewer prior convictions?
```{r}
(128+434+160)/810
```
0.89 that a randomly selected inmate has 2 or fewer prior convictions
d. What is the probability that a randomly selected inmate has more than 2 prior convictions?
```{r}
(64+24)/810
```
0.10 that a randomly selected inmate has more than 2 prior convictions
e. What is the expected value for the number of prior convictions?
```{r}
g<-((128*0/810) +(434*1/810) +(160*2/810) +(64*3/810) +(24*4/810))
mean(g)
```
the expected value for the number of prior convictions is 1.28
f. Calculate the variance and the standard deviation for the Prior Convictions.
```{r}
v<- ((0-1.28)^2) *(128/810) +((1-1.28)^2) * (434/810)+((2-1.28)^2) * (160/810)+((3-1.28)^2) *(64/810) +((4-1.28)^2) * (24/810)
v
sd<- sqrt(v)
sd
```
The variance is 0.856 and the standard deviation is 0.925.