The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).
(1b) The probability distribution of the LungCap with respect to gender is as follows:
Code
boxplot(df$LungCap ~ df$Gender)
(1c) The mean lung capacities for smokers and non-smokers can be found in the table below:
# A tibble: 2 × 2
Smoke name
<chr> <dbl>
1 no 7.77
2 yes 8.65
These means are not what I would expect. It looks like those who smoke (“yes”) have a higher long capacity (8.65) than those who do not smoke (7.77).
(1d) The relationship between Smoking and Lung Capacity within age groups
Code
#Age groups defined by:#“less than or#equal to 13”, #“14 to 15”, #“16 to 17”, #“greater than or equal to 18”.# Create variabledf <- df %>%mutate(age_group =case_when( Age <=13~"0-13", Age >13& Age <16~"14-15", Age >15& Age <18~"16-18", Age >=18~">= 18"),# Convert to factorage_group =factor( age_group,level =c("0-13", "14-15","16-18", ">= 18")))View(df)df %>%group_by(age_group,Smoke) %>%summarise_at(vars(LungCap), list(name = mean))
# A tibble: 8 × 3
# Groups: age_group [4]
age_group Smoke name
<fct> <chr> <dbl>
1 0-13 no 6.36
2 0-13 yes 7.20
3 14-15 no 9.14
4 14-15 yes 8.39
5 16-18 no 10.5
6 16-18 yes 9.38
7 >= 18 no 11.1
8 >= 18 yes 10.5
Code
dbinom(x=8,size=8,prob=.5)
[1] 0.00390625
Code
dbinom(x=6,size=8,prob=.5)
[1] 0.109375
(1e) Compare the lung capacities for smokers and non-smokers within each age group.
Code
ggplot(df, aes(x=age_group, y=LungCap, color = Smoke)) +geom_boxplot()
This data visualization makes more sense for what we expect from lung capacity when comparing smokers to non smokers. It looks like lunch capacity increases as the participants get older. The data could have more participants who are smokers and who are older. This unbalance in participants could be skewing the overall average lunch capacity.
(2a) What is the probability that a randomly selected inmate has exactly 2 prior convictions?
Code
dbinom(x =1, size =1, p =160/810)
[1] 0.1975309
(2b) What is the probability that a randomly selected inmate has fewer than 2 prior convictions?
Code
dbinom(x =1, size =1, p =sum(128+434)/810)
[1] 0.6938272
(2c) What is the probability that a randomly selected inmate has 2 or fewer prior convictions?
Code
dbinom(x =1, size =1, p =sum(128+434+160)/810)
[1] 0.891358
(2d) What is the probability that a randomly selected inmate has more than 2 prior convictions?
Code
dbinom(x =1, size =1, p =sum(64+24)/810)
[1] 0.108642
(2e) What is the expected value for the number of prior convictions?
Code
EV <-sum(StatePrison$number_convictions *StatePrison$Probability)print(EV)
[1] 1.28642
(2f) Calculate the variance and the standard deviation for the Prior Convictions.
Code
Var <-sum((StatePrison$number_convictions - EV) ^2* StatePrison$Probability)print(Var)
[1] 0.8562353
Code
SD <-sqrt(Var)print(SD)
[1] 0.9253298
Source Code
---title: "Homework_One"author: "Meredith Derian-Toth"description: "Template of course blog qmd file"date: "02/05/2023"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw1 - desriptive statistics - probability---# Question 1#### (1a) What does the distribution of LungCap look like?First, let's read in the data from the Excel file:```{r, echo=T}library("quarto")library("tidyverse")library("palmerpenguins")library(readxl)library(dplyr)library(ggplot2)df <-read_excel("_data/LungCapData.xls")#View(df)```The distribution of LungCap looks as follows:```{r, echo=T}hist(df$LungCap)```The histogram suggests that the distribution is close to a normal distribution. Most of the observations are close to the mean. Very few observations are close to the margins (0 and 15).#### (1b) The probability distribution of the LungCap with respect to gender is as follows:```{r BoxPlot}boxplot(df$LungCap ~ df$Gender)```#### (1c) The mean lung capacities for smokers and non-smokers can be found in the table below:```{r mean comparison}df %>%group_by(Smoke) %>%summarise_at(vars(LungCap), list(name = mean))```These means are not what I would expect. It looks like those who smoke ("yes") have a higher long capacity (8.65) than those who do not smoke (7.77).#### (1d) The relationship between Smoking and Lung Capacity within age groups ```{r additional variable of age group}#Age groups defined by:#“less than or#equal to 13”, #“14 to 15”, #“16 to 17”, #“greater than or equal to 18”.# Create variabledf <- df %>%mutate(age_group =case_when( Age <=13~"0-13", Age >13& Age <16~"14-15", Age >15& Age <18~"16-18", Age >=18~">= 18"),# Convert to factorage_group =factor( age_group,level =c("0-13", "14-15","16-18", ">= 18")))View(df)df %>%group_by(age_group,Smoke) %>%summarise_at(vars(LungCap), list(name = mean))dbinom(x=8,size=8,prob=.5)dbinom(x=6,size=8,prob=.5)```#### (1e) Compare the lung capacities for smokers and non-smokers within each age group. ```{r relationship btw age_group & LungCap}ggplot(df, aes(x=age_group, y=LungCap, color = Smoke)) +geom_boxplot()```This data visualization makes more sense for what we expect from lung capacity when comparing smokers to non smokers. It looks like lunch capacity increases as the participants get older. The data could have more participants who are smokers and who are older. This unbalance in participants could be skewing the overall average lunch capacity. # Question 2:Setting up the Dataframe```{r Probability table}StatePrison <-data.frame(number_convictions =0:4, InMateCount =c(128, 434, 160, 64, 24)) %>%mutate(Probability = InMateCount/810)View(StatePrison)```#### (2a) What is the probability that a randomly selected inmate has exactly 2 prior convictions?```{r probablity question (a)}dbinom(x =1, size =1, p =160/810)```#### (2b) What is the probability that a randomly selected inmate has fewer than 2 prior convictions?```{r Probability question (b)}dbinom(x =1, size =1, p =sum(128+434)/810)```#### (2c) What is the probability that a randomly selected inmate has 2 or fewer prior convictions?```{r probability question (c)}dbinom(x =1, size =1, p =sum(128+434+160)/810)```#### (2d) What is the probability that a randomly selected inmate has more than 2 prior convictions?```{r probability question (d)}dbinom(x =1, size =1, p =sum(64+24)/810)```#### (2e) What is the expected value for the number of prior convictions?```{r probability question (e)}EV <-sum(StatePrison$number_convictions *StatePrison$Probability)print(EV)```#### (2f) Calculate the variance and the standard deviation for the Prior Convictions.```{r probability question (f)}Var <-sum((StatePrison$number_convictions - EV) ^2* StatePrison$Probability)print(Var)``````{r probability question (g)}SD <-sqrt(Var)print(SD)```---