The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).
(1b) The probability distribution of the LungCap with respect to gender is as follows:
boxplot(df$LungCap ~ df$Gender)
(1c) The mean lung capacities for smokers and non-smokers can be found in the table below:
# A tibble: 2 × 2
Smoke name
<chr> <dbl>
1 no 7.77
2 yes 8.65
These means are not what I would expect. It looks like those who smoke (“yes”) have a higher long capacity (8.65) than those who do not smoke (7.77).
(1d) The relationship between Smoking and Lung Capacity within age groups
#Age groups defined by:#“less than or#equal to 13”, #“14 to 15”, #“16 to 17”, #“greater than or equal to 18”.# Create variabledf <- df %>%mutate(age_group =case_when( Age <=13~"0-13", Age >13& Age <16~"14-15", Age >15& Age <18~"16-18", Age >=18~">= 18"),# Convert to factorage_group =factor( age_group,level =c("0-13", "14-15","16-18", ">= 18")))View(df)df %>%group_by(age_group,Smoke) %>%summarise_at(vars(LungCap), list(name = mean))
# A tibble: 8 × 3
# Groups: age_group [4]
age_group Smoke name
<fct> <chr> <dbl>
1 0-13 no 6.36
2 0-13 yes 7.20
3 14-15 no 9.14
4 14-15 yes 8.39
5 16-18 no 10.5
6 16-18 yes 9.38
7 >= 18 no 11.1
8 >= 18 yes 10.5
[1] 0.00390625
[1] 0.109375
(1e) Compare the lung capacities for smokers and non-smokers within each age group.
ggplot(df, aes(x=age_group, y=LungCap, color = Smoke)) +geom_boxplot()
This data visualization makes more sense for what we expect from lung capacity when comparing smokers to non smokers. It looks like lunch capacity increases as the participants get older. The data could have more participants who are smokers and who are older. This unbalance in participants could be skewing the overall average lunch capacity.
(2a) What is the probability that a randomly selected inmate has exactly 2 prior convictions?
dbinom(x =1, size =1, p =160/810)
[1] 0.1975309
(2b) What is the probability that a randomly selected inmate has fewer than 2 prior convictions?
dbinom(x =1, size =1, p =sum(128+434)/810)
[1] 0.6938272
(2c) What is the probability that a randomly selected inmate has 2 or fewer prior convictions?
dbinom(x =1, size =1, p =sum(128+434+160)/810)
[1] 0.891358
(2d) What is the probability that a randomly selected inmate has more than 2 prior convictions?
dbinom(x =1, size =1, p =sum(64+24)/810)
[1] 0.108642
(2e) What is the expected value for the number of prior convictions?
EV <-sum(StatePrison$number_convictions *StatePrison$Probability)print(EV)
[1] 1.28642
(2f) Calculate the variance and the standard deviation for the Prior Convictions.
Var <-sum((StatePrison$number_convictions - EV) ^2* StatePrison$Probability)print(Var)
[1] 0.8562353
SD <-sqrt(Var)print(SD)
[1] 0.9253298
