Resubmission: Homework #1

hw1
Kalimah Muhammad
Author

Kalimah Muhammad

Published

October 3, 2022

Code
library(tidyverse)
library(dplyr)
library(readxl)
library (ggplot2)
lungcap<- read_excel("_data/LungCapData.xls")
knitr::opts_chunk$set(echo = TRUE)

LungCapData

1a. What does the distribution of LungCap look like?

Code
ggplot(lungcap, aes(x=LungCap))+ geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This is not normally distributed as there are far more observations of lower lung capacity than higher suggesting the distribution is negatively skewed.

1b. Compare the probability distribution of the LungCap with respect to Males and Females?

Code
lungcap %>%
group_by(Gender)%>%
summarise(mean(LungCap))
# A tibble: 2 × 2
  Gender `mean(LungCap)`
  <chr>            <dbl>
1 female            7.41
2 male              8.31

The average lung capacity for females is 7.41, lower than the average for males at 8.31.

1c. Compare the mean lung capacities for smokers and non-smokers. Does it make sense?

Code
lungcap %>%
group_by(Smoke)%>%
summarise(mean(LungCap))
# A tibble: 2 × 2
  Smoke `mean(LungCap)`
  <chr>           <dbl>
1 no               7.77
2 yes              8.65

The mean lung capacity for non-smokers is 7.77, lower than the mean for smokers at 8.65. At first glance, this seems contradictory as one would guess smokers to have a lower lung capacity than non-smokers.The following grid displays non-smokers as having overall higher lung capacity, conflicting with the mean above.

Code
ggplot(lungcap, aes(x = LungCap)) +
facet_grid(Gender ~ Smoke)+
  geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

1d. Examine the relationship between Smoking and Lung Capacity within age groups: “less than or equal to 13”, “14 to 15”, “16 to 17”, and “greater than or equal to 18”.

Code
#Lung capacity for those age 13 and under
lungcap %>%
filter(Age <= 13)%>%
group_by(Smoke)%>%
summarise(LungCap = mean(LungCap))
# A tibble: 2 × 2
  Smoke LungCap
  <chr>   <dbl>
1 no       6.36
2 yes      7.20
Code
#Lung capacity for those between the age of 14 to 15
lungcap%>%
filter(Age== 14 | Age ==15)%>%
group_by(Smoke)%>%
summarise(mean(LungCap))
# A tibble: 2 × 2
  Smoke `mean(LungCap)`
  <chr>           <dbl>
1 no               9.14
2 yes              8.39
Code
#Lung capacity for those between the age of 16 to 17
lungcap%>%
filter(Age==16 |Age==16)%>%
group_by(Smoke)%>%
summarise(mean(LungCap))
# A tibble: 2 × 2
  Smoke `mean(LungCap)`
  <chr>           <dbl>
1 no              10.1 
2 yes              8.90
Code
#Lung capacity for those 18 and older
lungcap%>%
filter(Age>=18)%>%
group_by(Smoke)%>%
summarise(mean(LungCap))
# A tibble: 2 × 2
  Smoke `mean(LungCap)`
  <chr>           <dbl>
1 no               11.1
2 yes              10.5

1e. Compare the lung capacities for smokers and non-smokers within each age group.

With the exception of those age 13 years old and under, all non-smokers had a greater lung capacity than smokers. For those over the age of 18, the difference of the average in lung capacity for non-smokers to smokers was 0.55. For 16-17 year olds, the difference was the greatest at 1.16. The difference for 14-15 year olds was 0.74 and for those 13 years old and under, the difference was -0.843.

Is your answer different from the one in part c? What could possibly be going on here?

Here the average lung capacity for non-smokers is higher than for smokers. This differs from the results of question 1c. Overall, we see lung capacity increase with age irrespective of smoking so this may contribute to the change in results.

1f. Calculate the correlation and covariance between Lung Capacity and Age. (use the cov() and cor() functions in R). Interpret results.

Code
cov(lungcap$LungCap, lungcap$Age)
[1] 8.738289
Code
cor(lungcap$LungCap, lungcap$Age)
[1] 0.8196749

The covariance between lung capacity and age is 8.74 suggesting a positive relationship in which both variables move in the same direction (i.e. for this data set an increase in lung capacity would suggest an increase in age as well).

The correlation between lung capacity and age is 0.82 suggesting a strong positive correlation (0.82 of a potential -1 to +1).

Inmate Data

Code
priors<- c(0, 1, 2, 3, 4)
frequency<- c(128, 434, 160, 64, 24)
prison <-data.frame(priors,frequency)
View(prison)

2a. What is the probability that a randomly selected inmate has exactly 2 prior convictions? 20%

2b. What is the probability that a randomly selected inmate has fewer than 2 prior convictions? 69%

2c. What is the probability that a randomly selected inmate has 2 or fewer prior convictions? 89%

2d. What is the probability that a randomly selected inmate has more than 2 prior convictions? 11%

2e. What is the expected value for the number of prior convictions?

Code
((128*0)+(434*1)+(160*2)+(64*3)+(24*4))/sum(frequency)
[1] 1.28642

2f. Calculate the variance and the standard deviation for the Prior Convictions.

Code
var(prison$priors)*((810-1)/810)#calculate population variance
[1] 2.496914
Code
sd(prison$priors)
[1] 1.581139