Zhiyuan Zhou
February 28, 2023
First, let’s read in the data from the Excel file:
The distribution of LungCap looks as follows:
The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).
# A tibble: 2 × 2
Smoke mean
<chr> <dbl>
1 no 7.77
2 yes 8.65
This result surprised me that smokers have more lung capacity than non-smokers.
`summarise()` has grouped output by 'AgeGroup'. You can override using the `.groups` argument.
`.groups` argument.
# A tibble: 8 × 5
# Groups: AgeGroup [4]
AgeGroup Smoke meanLungCap meanAge count
<fct> <chr> <dbl> <dbl> <int>
1 <=13 no 6.36 9.49 401
2 <=13 yes 7.20 11.7 27
3 14-15 no 9.14 14.5 105
4 14-15 yes 8.39 14.6 15
5 16-17 no 10.5 16.4 77
6 16-17 yes 9.38 16.6 20
7 >=18 no 11.1 18.5 65
8 >=18 yes 10.5 18.1 15
##e In age group “0-13”, smokers have higher lung capacity than non-smokers. In all other groups, smokers have less lung capacity than non-smokers. The number of samples under 13 gave it a clue about the interesting finding in 1c. And the mean age difference among smokers and non-smokers pointed out that the age difference is more likely to be the reason of higher lung capacity instead of smoking.
#Question 2
```{r, echo=T}
df <- read_excel("_data/LungCapData.xls")
```{r, echo=T}
```{r, echo=T}
main = "Lung Capacity by Gender",
xlab = "Gender",
ylab = "Lung Capacity",
```{r, echo=T}
df %>%
group_by(Smoke) %>%
summarize(mean = mean(LungCap))
df["AgeGroup"] =
c(0, 13, 15, 17, Inf),
c("<=13", "14-15","16-17", ">=18"),
right = T
group_by(AgeGroup, Smoke)%>%
summarize(meanLungCap = mean(LungCap), meanAge = mean(Age), count = n())
prob_2 <- (160 / 810)
prob_fewer2 <- (128 + 434) / 810
prob_2OrFewer <- (128 + 434 + 160) / 810
prob_more2 <- (64 + 24) / 810
expectation <- (0 * 128 + 1 * 434 + 2 * 160 + 3 * 64 + 4 * 24) / 810
variance <- sum(128 * (0 - expectation) ^ 2,
434 * (1 - expectation) ^ 2,
160 * (2 - expectation) ^ 2,
64 * (3 - expectation) ^ 2,
24 * (4 - expectation) ^ 2) / 810
sd <- sqrt(variance)