Stroke Predictor
library(distill)
library(dplyr)
library(readr)
library(tidyverse)
Stroke<- read.csv('healthcare-dataset-stroke-data.csv',TRUE,',')
class(Stroke)
[1] "data.frame"
colnames(Stroke)
[1] "id" "gender" "age"
[4] "hypertension" "heart_disease" "ever_married"
[7] "work_type" "Residence_type" "avg_glucose_level"
[10] "bmi" "smoking_status" "stroke"
dim(Stroke)
[1] 5110 12
The data set has 5110 observations of 12 variables (column names). Using R, this data set could be used to answer the following research questions:
1. Is there a single variable that can predict stroke? If yes, which is it?
2. Is work type a significant predictor of stroke?
3. Is residence type a significant predictor of stroke?
4. By splitting into test data and train data, I would like to build a model that could predict occurence of stroke.
Datasource: https://www.kaggle.com/fedesoriano/stroke-prediction-dataset
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Vespa (2022, Jan. 3). Data Analytics and Computational Social Science: HW3-Data for Final Project. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomowenvespa852354/
BibTeX citation
@misc{vespa2022hw3-data, author = {Vespa, Rhowena}, title = {Data Analytics and Computational Social Science: HW3-Data for Final Project}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomowenvespa852354/}, year = {2022} }