Data Analytics and Computational Social Science: HW5 -More Data Visualization

Rhowena Vespa

This final project will use the Stroke Prediction Dataset from Kaggle

Read CSV file into R

library(distill)
library(dplyr)
library(readr)
library(tidyverse)
Stroke<- read.csv('healthcare-dataset-stroke-data.csv',TRUE,',',na.strings = "N/A")
class(Stroke)

[1] "data.frame"

colnames(Stroke)

 [1] "id"                "gender"            "age"              
 [4] "hypertension"      "heart_disease"     "ever_married"     
 [7] "work_type"         "Residence_type"    "avg_glucose_level"
[10] "bmi"               "smoking_status"    "stroke"

dim(Stroke)

[1] 5110   12

Yesstroke <- subset(Stroke, stroke == 1, select= c("gender","age","hypertension","heart_disease","ever_married","work_type","Residence_type","avg_glucose_level","bmi","smoking_status","stroke"))
dim(Yesstroke)

[1] 249  11

Figure 1 -Smoking status of sample population

Observation: The sample population were generally non-smokers (never smoked and former smokers)

ggplot(Stroke, aes(x=age, fill=smoking_status))+
  geom_histogram(binwidth = 5)+
  facet_wrap(vars(smoking_status))

  labs(x="age", y="Count", title = "Risk Factors")

$x
[1] "age"

$y
[1] "Count"

$title
[1] "Risk Factors"

attr(,"class")
[1] "labels"

Figure 2- Smoking status of married patients who had stroke and hypertension.

Observation: Smoking status of most patients who had stroke was not significant when compared to other factors such as hypertension and marital status. Most patients who had stroke were MARRIED with NO HYPERTENSION.

ggplot(Yesstroke, aes(x=age, fill=smoking_status))+
  geom_histogram(binwidth = 5)+
  facet_wrap(vars(hypertension,ever_married))

  labs(x="age", y="Count", title = "Risk Factors")

$x
[1] "age"

$y
[1] "Count"

$title
[1] "Risk Factors"

attr(,"class")
[1] "labels"

Figure 3- Smoking status of female patients who had heart disease.

Observation: Smoking status of most patients who had stroke was not significant when compared to other factors such as gender and heart disease. Most patients who had stroke were FEMALE with NO HEART DISEASE.

ggplot(Yesstroke, aes(x=age, fill=smoking_status))+
  geom_histogram(binwidth = 5)+
  facet_wrap(vars(gender,heart_disease))

  labs(x="age", y="Count", title = "Risk Factors")

$x
[1] "age"

$y
[1] "Count"

$title
[1] "Risk Factors"

attr(,"class")
[1] "labels"

NewYesStroke <- matrix(c(66,141,47,220,149,135,42,183,108,202,29,98,114,160,0,0,0,0,2,0,47),ncol=7,byrow=TRUE)
colnames(NewYesStroke) <- c("Hypertension","Female","Heart Disease","Married","Private work","Urban Home","Smoker")
rownames(NewYesStroke) <- c("Yes", "No", "Unknown")
NewYesStroke <- as.table(NewYesStroke)
NewYesStroke

        Hypertension Female Heart Disease Married Private work
Yes               66    141            47     220          149
No               183    108           202      29           98
Unknown            0      0             0       0            2
        Urban Home Smoker
Yes            135     42
No             114    160
Unknown          0     47

Figure 4- Distribution of risk factors on patients who had stroke

Observation: Most patients who had stroke were: Non-Smokers, Married, had No Hypertension and No Heart Disease.

barplot(NewYesStroke,legend=T,beside=T,main='Risk Factors for patients who had Stroke', las = 2, cex.names = 0.75,col = c("pink", "blue","gray"))

ANSWERS TO QUESTIONS:

1. The visualizations need to account for the numerical variables.
2. Conclusions: Most patients who had stroke were: Smokers, Married, had No Hypertension and No Heart Disease.
3. Naive reader would need basic understanding of epidemiology
4. I think these observations are best visualized in matrix like 3D or 4D plots

Comment on this article Share:

HW5 -More Data Visualization

This final project will use the Stroke Prediction Dataset from Kaggle

Figure 1 -Smoking status of sample population

Observation: The sample population were generally non-smokers (never smoked and former smokers)

Figure 2- Smoking status of married patients who had stroke and hypertension.

Observation: Smoking status of most patients who had stroke was not significant when compared to other factors such as hypertension and marital status. Most patients who had stroke were MARRIED with NO HYPERTENSION.

Figure 3- Smoking status of female patients who had heart disease.

Observation: Smoking status of most patients who had stroke was not significant when compared to other factors such as gender and heart disease. Most patients who had stroke were FEMALE with NO HEART DISEASE.

Figure 4- Distribution of risk factors on patients who had stroke

Observation: Most patients who had stroke were: Non-Smokers, Married, had No Hypertension and No Heart Disease.

ANSWERS TO QUESTIONS:

Reuse

Citation