Project Proposal
desriptive statistics
probability
Project Proposal
Author

Mani Kanta Gogula & Rahul Gundeti

Published

October 11, 2022

Heart Disease in the United States Heart disease is the leading cause of death for men, women, and people of most racial and ethnic groups in the United States.1 One person dies every 34 seconds in the United States from cardiovascular disease.1 About 697,000 people in the United States died from heart disease in 2020—that’s 1 in every 5 deaths.1,2 Heart disease cost the United States about $229 billion each year from 2017 to 2018.3 This includes the cost of health care services, medicines, and lost productivity due to death.

Research Question : examining the relationship between the maximum heart rate one can achieve during exercise and the likelihood of developing heart disease. Using multiple logistic regression, examining handle the confounding effects of age and gender.

Hypothesis Testing : Is there any statistical difference between the gender and age in terms of heart attack prediction.

#Loading Dataset

Code
library(readr)
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.1.3
-- Attaching packages --------------------------------------- tidyverse 1.3.1 --
v ggplot2 3.3.5     v dplyr   1.0.8
v tibble  3.1.6     v stringr 1.4.0
v tidyr   1.2.0     v forcats 0.5.1
v purrr   0.3.4     
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
Code
heart_cleveland_upload <- read_csv("heart_cleveland_upload.csv")
Rows: 297 Columns: 14
-- Column specification --------------------------------------------------------
Delimiter: ","
dbl (14): age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpea...

i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
head(heart_cleveland_upload)
Code
dim(heart_cleveland_upload)
[1] 297  14

Data set contains 297 Columns and 14 columns

Code
colnames(heart_cleveland_upload)
 [1] "age"       "sex"       "cp"        "trestbps"  "chol"      "fbs"      
 [7] "restecg"   "thalach"   "exang"     "oldpeak"   "slope"     "ca"       
[13] "thal"      "condition"

here are 13 attributes

age: age in years sex: sex (1 = male; 0 = female) cp: chest pain type – Value 0: typical angina – Value 1: atypical angina – Value 2: non-anginal pain – Value 3: asymptomatic trestbps: resting blood pressure (in mm Hg on admission to the hospital) chol: serum cholestoral in mg/dl fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) restecg: resting electrocardiographic results – Value 0: normal – Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) – Value 2: showing probable or definite left ventricular hypertrophy by Estes’ criteria thalach: maximum heart rate achieved exang: exercise induced angina (1 = yes; 0 = no) oldpeak = ST depression induced by exercise relative to rest slope: the slope of the peak exercise ST segment – Value 0: upsloping – Value 1: flat – Value 2: downsloping ca: number of major vessels (0-3) colored by flourosopy thal: 0 = normal; 1 = fixed defect; 2 = reversable defect and the label condition: 0 = no disease, 1 = disease

Descriptive statistics

Code
summary(heart_cleveland_upload)
      age             sex               cp           trestbps    
 Min.   :29.00   Min.   :0.0000   Min.   :0.000   Min.   : 94.0  
 1st Qu.:48.00   1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:120.0  
 Median :56.00   Median :1.0000   Median :2.000   Median :130.0  
 Mean   :54.54   Mean   :0.6768   Mean   :2.158   Mean   :131.7  
 3rd Qu.:61.00   3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:140.0  
 Max.   :77.00   Max.   :1.0000   Max.   :3.000   Max.   :200.0  
      chol            fbs            restecg          thalach     
 Min.   :126.0   Min.   :0.0000   Min.   :0.0000   Min.   : 71.0  
 1st Qu.:211.0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:133.0  
 Median :243.0   Median :0.0000   Median :1.0000   Median :153.0  
 Mean   :247.4   Mean   :0.1448   Mean   :0.9966   Mean   :149.6  
 3rd Qu.:276.0   3rd Qu.:0.0000   3rd Qu.:2.0000   3rd Qu.:166.0  
 Max.   :564.0   Max.   :1.0000   Max.   :2.0000   Max.   :202.0  
     exang           oldpeak          slope              ca        
 Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
 Median :0.0000   Median :0.800   Median :1.0000   Median :0.0000  
 Mean   :0.3266   Mean   :1.056   Mean   :0.6027   Mean   :0.6768  
 3rd Qu.:1.0000   3rd Qu.:1.600   3rd Qu.:1.0000   3rd Qu.:1.0000  
 Max.   :1.0000   Max.   :6.200   Max.   :2.0000   Max.   :3.0000  
      thal         condition     
 Min.   :0.000   Min.   :0.0000  
 1st Qu.:0.000   1st Qu.:0.0000  
 Median :0.000   Median :0.0000  
 Mean   :0.835   Mean   :0.4613  
 3rd Qu.:2.000   3rd Qu.:1.0000  
 Max.   :2.000   Max.   :1.0000  
Code
glimpse(heart_cleveland_upload)
Rows: 297
Columns: 14
$ age       <dbl> 69, 69, 66, 65, 64, 64, 63, 61, 60, 59, 59, 59, 59, 58, 56, ~
$ sex       <dbl> 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, ~
$ cp        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
$ trestbps  <dbl> 160, 140, 150, 138, 110, 170, 145, 134, 150, 178, 170, 160, ~
$ chol      <dbl> 234, 239, 226, 282, 211, 227, 233, 234, 240, 270, 288, 273, ~
$ fbs       <dbl> 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, ~
$ restecg   <dbl> 2, 0, 0, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 0, 2, 0, 2, ~
$ thalach   <dbl> 131, 151, 114, 174, 144, 155, 150, 145, 171, 145, 159, 125, ~
$ exang     <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ~
$ oldpeak   <dbl> 0.1, 1.8, 2.6, 1.4, 1.8, 0.6, 2.3, 2.6, 0.9, 4.2, 0.2, 0.0, ~
$ slope     <dbl> 1, 0, 2, 1, 1, 1, 2, 1, 0, 2, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, ~
$ ca        <dbl> 1, 2, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 2, ~
$ thal      <dbl> 0, 0, 0, 0, 0, 2, 1, 0, 0, 2, 2, 0, 0, 0, 2, 1, 2, 0, 2, 0, ~
$ condition <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, ~