hw1
desriptive statistics
probability
Homework_1_603
Author

Mani Kanta Gogula

Published

October 3, 2022

Code
library(tidyverse)
library(readxl)
library(ggplot2)
library(stats)

knitr::opts_chunk$set(echo = TRUE)

Question 1

Code
Lung_data<- read_excel("C:/Users/manik/Desktop/LungCapData.xls")
Lung_data

Given data consists of 725 rows and 6 columns

  1. What does the distribution of LungCap look like?
Code
Lung_data %>%
  ggplot(aes(LungCap, ..density..)) +
  geom_histogram() +
  geom_density(color = "Red") +
  theme_classic() + 
  labs(title = "Probability distribution of LungCap", x = "Lung Capcity", y = "Probability density")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Based on above histogram , we can say the distribution is very close to the normal distribution

Compare the probability distribution of the LungCap with respect to Males and Females?

Code
Lung_data %>%
  ggplot(aes(y = dnorm(LungCap), color = Gender)) +
  geom_boxplot() +
  labs(title = "Probability distribution of LungCap based on gender", y = "Probability density")

Compare the mean lung capacities for smokers and non-smokers. Does it make sense?

Code
Mean_smokers <- Lung_data %>%
  group_by(Smoke) %>%
  summarise(mean = mean(LungCap))
Mean_smokers

The mean of the lung capacity who smokes is greater than the people who doesnt smoke which doesnt make any sense in practical

Examine the relationship between Smoking and Lung Capacity within age groups: “less than or equal to 13”, “14 to 15”, “16 to 17”, and “greater than or equal to 18”.

Code
Lung_data <- mutate(Lung_data, AgeGrp = case_when(Age <= 13 ~ "less than or equal to 13",
                                    Age == 14 | Age == 15 ~ "14 to 15",
                                    Age == 16 | Age == 17 ~ "16 to 17",
                                    Age >= 18 ~ "greater than or equal to 18"))

Lung_data %>%
  ggplot(aes(y = LungCap, color = Smoke)) +
  geom_histogram(bins = 25) +
  facet_wrap(vars(AgeGrp)) +
  theme_classic() + coord_flip()

Code
  labs(title = "Relationship of LungCap and Smoke based on age categories", y = "Lung Capacity", x = "Frequency")
$y
[1] "Lung Capacity"

$x
[1] "Frequency"

$title
[1] "Relationship of LungCap and Smoke based on age categories"

attr(,"class")
[1] "labels"

Compare the lung capacities for smokers and non-smokers within each age group. Is your answer different from the one in part d. What could possibly be going on here?

Code
Lung_data %>%
  ggplot(aes(x = Age, y = LungCap, color = Smoke)) +
  geom_line() +
  theme_classic() + 
  facet_wrap(vars(Smoke)) +
  labs(title = "Relationship of LungCap and Smoke based on age", y = "Lung Capacity", x = "Age")

Form the above data we can compare 1d and 1e and can say the results are pretty similar. Only 10 and above age group smoke.

Calculate the correlation and covariance between Lung Capacity and Age. (use the cov() and cor() functions in R). Interpret your results.

Code
Covariance_LA <- cov(Lung_data$LungCap, Lung_data$Age)
Correlation_LA <- cor(Lung_data$LungCap, Lung_data$Age)
Covariance_LA
[1] 8.738289
Code
Correlation_LA
[1] 0.8196749

From the above result we can say that both covariance and correlation is positive and which indicates direct relationship that means Lungcapacity increases as age increases

Question 2

Code
Prior_convitions <- c(0:4)
Inmate_count <- c(128, 434, 160, 64, 24)
IP<- data_frame(Prior_convitions, Inmate_count)
Warning: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Code
IP
Code
IP <- mutate(IP, Probability = Inmate_count/sum(Inmate_count))
IP

What is the probability that a randomly selected inmate has exactly 2 prior convictions?

Code
IP %>%
  filter(Prior_convitions == 2) %>%
  select(Probability)

What is the probability that a randomly selected inmate has fewer than 2 prior convictions?

Code
p_2 <- IP %>%
  filter(Prior_convitions < 2)
sum(p_2$Probability)
[1] 0.6938272

What is the probability that a randomly selected inmate has 2 or fewer prior convictions?

Code
p <- IP %>%
  filter(Prior_convitions <= 2)
sum(p$Probability)
[1] 0.891358

What is the probability that a randomly selected inmate has more than 2 prior convictions?

Code
P_3 <- IP %>%
  filter(Prior_convitions > 2)
sum(P_3$Probability)
[1] 0.108642

What is the expected value for the number of prior convictions?

Code
IP <- mutate(IP, Wm = Prior_convitions*Probability)
expe<- sum(IP$Wm)
expe
[1] 1.28642

Calculate the variance and the standard deviation for the Prior Convictions.

Code
var_ <-sum(((IP$Prior_convitions-expe)^2)*IP$Probability)
var_
[1] 0.8562353

standard deviation:

Code
sqrt(var_)
[1] 0.9253298