hw1
desriptive statistics
probability
Homework 1
Author

Hannah Rosenbaum

Published

May 4, 2023

Question 1

a

First, let’s read in the data from the Excel file:

Code
#library(readxl)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Code
#df <- read_excel("_data/LungCapData.xls")
df <- LungCapData
Error in eval(expr, envir, enclos): object 'LungCapData' not found

The distribution of LungCap looks as follows:

Code
hist(df$LungCap)
Error in df$LungCap: object of type 'closure' is not subsettable

The distribution of the histogram reflects a pattern that is symmetric unimodal.

b

Code
box <- boxplot(df$LungCap ~ df$Gender)
Error in df$LungCap: object of type 'closure' is not subsettable
Code
box
function (which = "plot", lty = "solid", ...) 
{
    which <- pmatch(which[1L], c("plot", "figure", "inner", "outer"))
    .External.graphics(C_box, which = which, lty = lty, ...)
    invisible()
}
<bytecode: 0x0000027b3db5a9d0>
<environment: namespace:graphics>
Code
summary(box)
Error in object[[i]]: object of type 'closure' is not subsettable

We denote that the probability in respect to gender shows female (7.45, 8.03) compared to (8.04227, 8.66499). Thus, this relationship indicates that when females are compared to male counterparts for lung capacity strength we see that men have a higer lung capacity than that of women.

c

Code
df %>% group_by(Smoke) %>% summarise(LungCap = mean(LungCap))
Error in UseMethod("group_by"): no applicable method for 'group_by' applied to an object of class "function"

No, the lung capacity dataset does not indicate a clear causal relationship due to the implication of smokers having a higer lung capacity than non-smoker counterparts. Thus, we can infer that this relationship has a more spurious and intervening effect in regards to this relationship.

d and e

Code
df[df$Age <= 13, ] %>% group_by(Smoke) %>% summarise(LungCap = mean(LungCap))
Error in df$Age: object of type 'closure' is not subsettable
Code
df[df$Age > 13 & df$Age <= 15, ] %>% group_by(Smoke) %>% summarise(LungCap = mean(LungCap))
Error in df$Age: object of type 'closure' is not subsettable
Code
df[df$Age > 15 & df$Age <= 17, ] %>% group_by(Smoke) %>% summarise(LungCap = mean(LungCap))
Error in df$Age: object of type 'closure' is not subsettable
Code
df[df$Age > 17, ] %>% group_by(Smoke) %>% summarise(LungCap = mean(LungCap))
Error in df$Age: object of type 'closure' is not subsettable

For the the thirteen and below answer we see the same effect as in part c. However, for all other age groups we see the effect reverse. Starting at age 14-15 we see a slightly lower lung capacity between smokers to non-smokers being that of a .747143 difference. This effect is also seen with the age range of 16-17 with there being a lower lung capacity of smokers by 1.0860, this denotes a difference of smoking at an earlier age impacts health by a .338917 with more lasting effects starting in the length of smoke time. There is a .5552 difference between smokers and non-smokers from 18 and above. Thus, denoting that smoking starts to impact health the longer smoking occurs and reduces lung capacity.

Question 2

a

= 160 / 810

= 0.198

b

= 128 + 434 / 810

= 0.694

c

= 128 + 434 + 160 / 810

= 0.891

d

= 64 + 24 / 810

= 0.109

e

E(x) = 0 * 128/810 + 1 * 434/810 + 2 * 160/810 + 3 * 64/810 + 4 * 24/810

= 0 + 0.536 + 0.395 + 0.237 + 0.119

= 1.287

f

Var(x) = (0^2 * 128/810 + 1^2 * 434/810 + 2^2 * 160/810 + 3^2 * 64/810 + 4^2 * 24/810) * (5 / 4)

= (0 + 0.536 + 0.79 + 0.711 + 0.796) * 0.8

= 2.2664

Std Dev = SQRT(Var(x))

= SQRT(2.2664)

= 1.505