Homework3

HW3 using sleep data

Eunsol Noh
3/11/2022

FOR THE FINAL PROJECT

A. Data information

I picked sleep-related data to see different features of sleep and how they are related with different conditions. As I am a first-year phd student in the sleep research lab, my interest is fairly biased to sleep:). As I wasn’t sure if I could use the data from the lab I am belonged to, I imported data from Kaggle (https://www.kaggle.com/danagerous/sleep-data).

Data of sleep cycle and other features was recorded between 2014 and 2018 from approximately 180 subjects through Northcube app. This Swedish app which is available on iOS makes us track our life cycle including sleep cycle(patterns).

B. Variable information

From the app, the datasets includes sleep features regarding

  1. Start = time to fall asleep
  2. End = time to wake up
  3. Sleep quality = as percentage of how well people slept
  4. Time in bed (TIB) = time of staying in bed (Originally, this was hour-based, which was changed to minutes for further analyses; mins)
  5. Mood.at.awake = The mood of subjects wthen they wake up. I know this is not a good name for the variable represnting its feature well enough. So, I changed it to “Mood.at.awake”
  6. Sleep notes = side notes for each participant (e.g. Drank coffee);
  7. Heart rate = the average heart rate during sleep

The information above is from Ameen MS, Cheung LM, Hauser T, Hahn MA, Schabus M. About the Accuracy and Problems of Consumer Devices in the Assessment of Sleep. Sensors (Basel). 2019;19(19):4160. Published 2019 Sep 25. doi:10.3390/s19194160

  1. coffee_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only coffee state is extracted for this variable.

  2. tea_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only tea state is extracted for this variable.

  3. working_out_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only working-out state is extracted for this variable.

  4. stress_state = The variable is created from “Mood.at.awake” where it has subjects’s side notes. Only stress state is extracted for this variable.

  5. **_TST_mins** (which is I made up and added to the columns; TST) = time between Start and End (mins)

-Names in italic are the variables I made up for the further analyses.

C. Goals

  1. First, as it wasn’t clear enough that “time_in_bed” is differnt from the Total sleep time (TST) (which I made a name for the time from the “start” to “end”), I will check if this is right. In sleep study, we usually think “time_in_bed” means the the time we stay in the bed including the time of being awake and TST. The TST is the actual time to fall asleep before the time to wake up. So, I will check the sleep efficiency through the equation following: TST/TIB*100 if they are different.

  2. I will see if there are relationships between features and conditions.

    2.1 There is a paper that “worse mood” was reported from the subjects who woke up early compared to those who fell asleep late assuming they had same amount of sleep. So, I will see the realtionship between time to fall asleep or time to wake up vs. sleep quality.

    2.2 I will see the effects of coffee consumed during the day on sleep features -> Drank coffee vs. time to fall asleep, time to wake up, sleep quality or TST

    2.3 I will see the effects of tea consumed during the day on sleep features -> Drank tea vs. time to fall asleep, time to wake up, sleep quality or TST

    2.4 I will see the effects of stress during the day on sleep features -> Stressful day vs. time to fall asleep, time to wake up, sleep quality or TST

    2.5 I will see the effects of excercise during the day on sleep features -> Worked out vs. time to fall asleep, time to wake up, sleep quality or TST

sleep<-read.csv(file="sleepdatacsv.csv",sep=";") 
sleep_data<-rename(sleep,Mood.at.awake=Wake.up) #changed the col name of Wake.up to Mood.at.awake
sleep_data = select(sleep_data, 1:7) #excluded the variable of "activity steps", which was the last variable (8th) as many subjects didn't include this information

sleep_data<-sleep_data %>%
  drop_na(Sleep.quality,Sleep.Notes,Mood.at.awake,Heart.rate) #excluded rows that have blank for  some variables

#changing hours to minutes
times<-as.POSIXlt(sleep_data$Time.in.bed, format="%H:%M")
sleep_data$Time.in.bed<-times$hour*60+times$min

#changing the percentage character to numeric
sleep_data$Sleep.quality<-as.numeric(sub("%","",sleep_data$Sleep.quality))

#mutate: Sleep.Notes are divided to columns corresponding to "Drank coffee", "Drank tea", "Worked out" and "Stressful day"in the variable named states for each. 
sleep_data<-sleep_data %>%
  mutate(coffee_state = case_when(
    grepl("Drank coffee",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank coffee",Sleep.Notes) == FALSE ~'No' #no
  )) %>%
    
  mutate(tea_state = case_when(
    grepl("Drank tea",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank tea",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%

  mutate(working_out_state = case_when(
    grepl("Worked out",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Worked out",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  
  mutate(stress_state = case_when(
    grepl("Stressful day",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Stressful day",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  
  mutate(TST_mins = as.integer(difftime(End,Start))) 

head(arrange(sleep_data, Sleep.quality))
                Start                 End Sleep.quality Time.in.bed
1 2014-12-30 21:17:50 2014-12-30 21:33:54             3          16
2 2015-01-19 05:06:38 2015-01-19 06:20:29            16          73
3 2015-06-05 03:45:52 2015-06-05 05:41:01            23         115
4 2015-05-06 21:47:25 2015-05-07 05:21:38            50         454
5 2015-04-28 21:41:45 2015-04-29 05:00:17            53         438
6 2015-03-04 20:53:47 2015-03-05 06:13:31            54         559
  Mood.at.awake                                     Sleep.Notes
1            :|                                   Stressful day
2            :)                                                
3            :)                                                
4            :)      Ate late:Drank coffee:Drank tea:Worked out
5            :)                         Drank coffee:Worked out
6            :) Drank coffee:Drank tea:Stressful day:Worked out
  Heart.rate coffee_state tea_state working_out_state stress_state
1         72           No        No                No          Yes
2         58           No        No                No           No
3         57           No        No                No           No
4         59          Yes       Yes               Yes           No
5         59          Yes        No               Yes           No
6         68          Yes       Yes               Yes          Yes
  TST_mins
1       16
2       73
3      115
4      454
5      438
6      559
#save(sleep_data, file = "sleep_data.csv")
write.table(sleep_data, file = "sleep_data.csv",
            sep = "\t", row.names = F)

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Noh (2022, March 23). Data Analytics and Computational Social Science: Homework3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh876604/

BibTeX citation

@misc{noh2022homework3,
  author = {Noh, Eunsol},
  title = {Data Analytics and Computational Social Science: Homework3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh876604/},
  year = {2022}
}