Stroke Predictor
library(distill)
library(dplyr)
library(readr)
library(tidyverse)
Stroke<- read.csv('healthcare-dataset-stroke-data.csv',TRUE,',',na.strings = "N/A")
class(Stroke)
[1] "data.frame"
colnames(Stroke)
[1] "id" "gender" "age"
[4] "hypertension" "heart_disease" "ever_married"
[7] "work_type" "Residence_type" "avg_glucose_level"
[10] "bmi" "smoking_status" "stroke"
dim(Stroke)
[1] 5110 12
Yesstroke <- subset(Stroke, stroke == 1, select= c("gender","age","hypertension","heart_disease","ever_married","work_type","Residence_type","avg_glucose_level","bmi","smoking_status","stroke"))
dim(Yesstroke)
[1] 249 11
ggplot(Stroke, aes(x=age, fill=smoking_status))+
geom_histogram(binwidth = 5)+
facet_wrap(vars(smoking_status))
labs(x="age", y="Count", title = "Risk Factors")
$x
[1] "age"
$y
[1] "Count"
$title
[1] "Risk Factors"
attr(,"class")
[1] "labels"
ggplot(Yesstroke, aes(x=age, fill=smoking_status))+
geom_histogram(binwidth = 5)+
facet_wrap(vars(hypertension,ever_married))
labs(x="age", y="Count", title = "Risk Factors")
$x
[1] "age"
$y
[1] "Count"
$title
[1] "Risk Factors"
attr(,"class")
[1] "labels"
ggplot(Yesstroke, aes(x=age, fill=smoking_status))+
geom_histogram(binwidth = 5)+
facet_wrap(vars(gender,heart_disease))
labs(x="age", y="Count", title = "Risk Factors")
$x
[1] "age"
$y
[1] "Count"
$title
[1] "Risk Factors"
attr(,"class")
[1] "labels"
NewYesStroke <- matrix(c(66,141,47,220,149,135,42,183,108,202,29,98,114,160,0,0,0,0,2,0,47),ncol=7,byrow=TRUE)
colnames(NewYesStroke) <- c("Hypertension","Female","Heart Disease","Married","Private work","Urban Home","Smoker")
rownames(NewYesStroke) <- c("Yes", "No", "Unknown")
NewYesStroke <- as.table(NewYesStroke)
NewYesStroke
Hypertension Female Heart Disease Married Private work
Yes 66 141 47 220 149
No 183 108 202 29 98
Unknown 0 0 0 0 2
Urban Home Smoker
Yes 135 42
No 114 160
Unknown 0 47
1. The visualizations need to account for the numerical variables.
2. Conclusions: Most patients who had stroke were: Smokers, Married, had No Hypertension and No Heart Disease.
3. Naive reader would need basic understanding of epidemiology
4. I think these observations are best visualized in matrix like 3D or 4D plots
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Vespa (2022, Jan. 14). Data Analytics and Computational Social Science: HW5 -More Data Visualization. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomowenvespa855102/
BibTeX citation
@misc{vespa2022hw5, author = {Vespa, Rhowena}, title = {Data Analytics and Computational Social Science: HW5 -More Data Visualization}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomowenvespa855102/}, year = {2022} }