Data Analytics and Computational Social Science: Homework3

Eunsol Noh

FOR THE FINAL PROJECT

A. Data information

I picked sleep-related data to see different features of sleep and how they are related with different conditions. As I am a first-year phd student in the sleep research lab, my interest is fairly biased to sleep:). As I wasn’t sure if I could use the data from the lab I am belonged to, I imported data from Kaggle (https://www.kaggle.com/danagerous/sleep-data).

Data of sleep cycle and other features was recorded between 2014 and 2018 from approximately 180 subjects through Northcube app. This Swedish app which is available on iOS makes us track our life cycle including sleep cycle(patterns).

B. Variable information

From the app, the datasets includes sleep features regarding

Start = time to fall asleep
End = time to wake up
Sleep quality = as percentage of how well people slept
Time in bed (TIB) = time of staying in bed (Originally, this was hour-based, which was changed to minutes for further analyses; mins)
Mood.at.awake = The mood of subjects wthen they wake up. I know this is not a good name for the variable represnting its feature well enough. So, I changed it to “Mood.at.awake”
Sleep notes = side notes for each participant (e.g. Drank coffee);
Heart rate = the average heart rate during sleep

The information above is from Ameen MS, Cheung LM, Hauser T, Hahn MA, Schabus M. About the Accuracy and Problems of Consumer Devices in the Assessment of Sleep. Sensors (Basel). 2019;19(19):4160. Published 2019 Sep 25. doi:10.3390/s19194160

coffee_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only coffee state is extracted for this variable.
tea_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only tea state is extracted for this variable.
working_out_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only working-out state is extracted for this variable.
stress_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only stress state is extracted for this variable.
**_TST_mins** (which is I made up and added to the columns; TST) = time between Start and End (mins)

-Names in italic are the variables I made up for the further analyses.

C. Goals

First, as it wasn’t clear enough that “time_in_bed” is differnt from the Total sleep time (TST) (which I made a name for the time from the “start” to “end”), I will check if this is right. In sleep study, we usually think “time_in_bed” means the the time we stay in the bed including the time of being awake and TST. The TST is the actual time to fall asleep before the time to wake up. So, I will check the sleep efficiency through the equation following: TST/TIB*100 if they are different.
I will see if there are relationships between features and conditions.

2.1 There is a paper that “worse mood” was reported from the subjects who woke up early compared to those who fell asleep late assuming they had same amount of sleep. So, I will see the realtionship between time to fall asleep or time to wake up vs. sleep quality.

2.2 I will see the effects of coffee consumed during the day on sleep features -> Drank coffee vs. time to fall asleep, time to wake up, sleep quality or TST

2.3 I will see the effects of tea consumed during the day on sleep features -> Drank tea vs. time to fall asleep, time to wake up, sleep quality or TST

2.4 I will see the effects of stress during the day on sleep features -> Stressful day vs. time to fall asleep, time to wake up, sleep quality or TST

2.5 I will see the effects of excercise during the day on sleep features -> Worked out vs. time to fall asleep, time to wake up, sleep quality or TST

sleep<-read.csv(file="sleepdatacsv.csv",sep=";") 
sleep_data<-rename(sleep,Mood.at.awake=Wake.up) #changed the col name of Wake.up to Mood.at.awake

sleep_data = select(sleep_data, 1:7) #excluded the variable of "activity steps", which was the last variable (8th) as many subjects didn't include this information

sleep_data<-sleep_data %>%
  drop_na(Sleep.quality,Sleep.Notes,Mood.at.awake,Heart.rate) #excluded rows that have blank for  some variables

#changing hours to minutes
times<-as.POSIXlt(sleep_data$Time.in.bed, format="%H:%M")
sleep_data$Time.in.bed<-times$hour*60+times$min

#changing the percentage character to numeric
sleep_data$Sleep.quality<-as.numeric(sub("%","",sleep_data$Sleep.quality))

#mutate: Sleep.Notes are divided to columns corresponding to "Drank coffee", "Drank tea", "Worked out" and "Stressful day"in the variable named states for each. 
sleep_data<-sleep_data %>%
  mutate(coffee_state = case_when(
    grepl("Drank coffee",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank coffee",Sleep.Notes) == FALSE ~'No' #no
  )) %>%
    
  mutate(tea_state = case_when(
    grepl("Drank tea",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank tea",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%

  mutate(working_out_state = case_when(
    grepl("Worked out",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Worked out",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  
  mutate(stress_state = case_when(
    grepl("Stressful day",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Stressful day",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  
  mutate(TST_mins = as.integer(difftime(End,Start))) 

head(arrange(sleep_data, Sleep.quality))

                Start                 End Sleep.quality Time.in.bed
1 2014-12-30 21:17:50 2014-12-30 21:33:54             3          16
2 2015-01-19 05:06:38 2015-01-19 06:20:29            16          73
3 2015-06-05 03:45:52 2015-06-05 05:41:01            23         115
4 2015-05-06 21:47:25 2015-05-07 05:21:38            50         454
5 2015-04-28 21:41:45 2015-04-29 05:00:17            53         438
6 2015-03-04 20:53:47 2015-03-05 06:13:31            54         559
  Mood.at.awake                                     Sleep.Notes
1            :|                                   Stressful day
2            :)                                                
3            :)                                                
4            :)      Ate late:Drank coffee:Drank tea:Worked out
5            :)                         Drank coffee:Worked out
6            :) Drank coffee:Drank tea:Stressful day:Worked out
  Heart.rate coffee_state tea_state working_out_state stress_state
1         72           No        No                No          Yes
2         58           No        No                No           No
3         57           No        No                No           No
4         59          Yes       Yes               Yes           No
5         59          Yes        No               Yes           No
6         68          Yes       Yes               Yes          Yes
  TST_mins
1       16
2       73
3      115
4      454
5      438
6      559

#save(sleep_data, file = "sleep_data.csv")
write.table(sleep_data, file = "sleep_data.csv",
            sep = "\t", row.names = F)

Comment on this article Share:

Homework3

FOR THE FINAL PROJECT

Reuse

Citation