Data Analytics and Computational Social Science: Final_Project (2nd), UPDATED VERSION

Eunsol Noh

INTRODUCTION

I am a PhD student in Sleep lab in Neuroscience and behavior program. In contrast to what I am working on the current lab, which is a specific theme in terms of sleep, I wanted to show and know about daily features affecting sleep features. I was able to find information regarding sleep and daily features from Kaggle. My ultimate goal in this project is to show how some features of daily lives such as working out, stress level, coffee intake and tea intake are related to sleep onset/offset, sleep quality, heart rate in sleep and sleeping duration using 162 participants. Further, I want to see if sleep features are related to mood at awake as well.

DATA

The data is from Kaggle (https://www.kaggle.com/danagerous/sleep-data). Sleep was recorded from a Swedish application (iOS) from 180 subjects. 18 people are excluded due to missing information for some categories (i.e., total participants = 162).
variables:
1. Start : time to go to the bed
2. End : time to wake up
3. Sleep quality : quality of sleep (%)
4. Total.sleep.time: duration of sleep (min)
5. Mood at wake: good or bad at awake
6. Sleep notes: notes of checking participants’ conditions (e.g., stress). Based on this variable, new columns indicating each status of conditions (e.g., coffee_state)were created .
7. Heart rate : averaged heart rate during sleep
8. coffee state: check if they had coffee on the day of the sleep (yes or no)
9. tea state : check if they had tea on the day of the sleep (yes or no)
10. working out state: check if they worked out on the day of the sleep (yes or no)
11. stress state: check if they felts stressed on the day of the sleep (yes or no)
12. TST_mins: total sleep time was calculated by subtracting start time from end time. This did not show any difference from total.sleep.time. Therefore, 12th col was neglected for the further analyses.

#Libraries
library(readr)
library(tidyverse)
library(tidyr)
library(dplyr)
library(lubridate)
library(ggplot2)
library(GGally)
library(hrbrthemes)
library(viridis)
library(ggridges)
library(forcats)
library(patchwork)
library(ggExtra)
library(dygraphs)
library(chron)
library(hexbin)
library(RColorBrewer)
library(hms)
knitr::opts_chunk$set(echo = TRUE)

IMPORT DATA AND CHANGE ONE COLUMN NAME

#setup
setwd('/Users/eunsolnoh/Desktop/dacss601/R') #set path

#import data
sleep<-read.csv(file="sleepdatacsv.csv",sep=";")  
sleep_data<-rename(sleep,Mood.at.awake=Wake.up) 
#change one col name of Wake.up to Mood.at.awake for better understanding
knitr::opts_chunk$set(echo = TRUE)

CLEAN DATA WITH MUTATE()

sleep_data = select(sleep_data, 1:7) #excluded the variable of "activity steps", 
#which was the last variable (8th) as many subjects didn't include this information

sleep_data<-sleep_data %>%
  drop_na(Sleep.quality,Sleep.Notes,Mood.at.awake,Heart.rate) #excluded rows that have 
#blank for  some variables

#convert hours to minutes for the column of "time in bed" in order to compare it 
#with total sleep time that was the column created by subtracting sleep start time 
#from sleep end time. 
times<-as.POSIXlt(sleep_data$Time.in.bed, format="%H:%M")
sleep_data$Time.in.bed<-times$hour*60+times$min

#changing the percentage character of the column of "sleep quality" to numeric
sleep_data$Sleep.quality<-as.numeric(sub("%","",sleep_data$Sleep.quality))

#mutate: Sleep.Notes are divided to columns corresponding to "Drank coffee", 
#"Drank tea", "Worked out" and "Stressful day"in the variable named states for
#each for further analyses. 
sleep_data<-sleep_data %>%
  mutate(coffee_state = case_when(
    grepl("Drank coffee",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank coffee",Sleep.Notes) == FALSE ~'No' #no
  )) %>%
    
  mutate(tea_state = case_when(
    grepl("Drank tea",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Drank tea",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%

  mutate(working_out_state = case_when(
    grepl("Worked out",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Worked out",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  
  mutate(stress_state = case_when(
    grepl("Stressful day",Sleep.Notes) == TRUE ~ 'Yes', #yes
    grepl("Stressful day",Sleep.Notes) == FALSE ~ 'No' #no
  )) %>%
  

#show the time between sleep start and sleep end in order to check if this 
#calculated time is the same as time in bed (the 4th column).
mutate(TST_mins = as.integer(difftime(End,Start))) 
head(arrange(sleep_data, Sleep.quality))

                Start                 End Sleep.quality Time.in.bed
1 2014-12-30 21:17:50 2014-12-30 21:33:54             3          16
2 2015-01-19 05:06:38 2015-01-19 06:20:29            16          73
3 2015-06-05 03:45:52 2015-06-05 05:41:01            23         115
4 2015-05-06 21:47:25 2015-05-07 05:21:38            50         454
5 2015-04-28 21:41:45 2015-04-29 05:00:17            53         438
6 2015-03-04 20:53:47 2015-03-05 06:13:31            54         559
  Mood.at.awake                                     Sleep.Notes
1            :|                                   Stressful day
2            :)                                                
3            :)                                                
4            :)      Ate late:Drank coffee:Drank tea:Worked out
5            :)                         Drank coffee:Worked out
6            :) Drank coffee:Drank tea:Stressful day:Worked out
  Heart.rate coffee_state tea_state working_out_state stress_state
1         72           No        No                No          Yes
2         58           No        No                No           No
3         57           No        No                No           No
4         59          Yes       Yes               Yes           No
5         59          Yes        No               Yes           No
6         68          Yes       Yes               Yes          Yes
  TST_mins
1       16
2       73
3      115
4      454
5      438
6      559

#I wanted to save the cleaned data for in case.
#save(sleep_data, file = "sleep_data.csv")
write.table(sleep_data, file = "sleep_data.csv",
            sep = "\t", row.names = F)

My initial plan included the analysis to show the relationship between the actually total sleep time and the time in the bed. When I saw the columns of time in bed and TST_mins(total sleep time in mins), they were same. In sleep research, those are used in different meanings: time in bed is equal to time spent on the bed including procrastinating time before/after the actual sleep. Total sleep time usually indicates the actual sleep. As it was not clear that those two terms were used differently, I checked TST_mins by manually subtracting the time to fall asleep from the time to wake up. At the end, it was observed that TST_mins was the same as time in bed. Therefore, TST_mins was ignored for the further analyses due to insufficient information of the actual sleep time.
Further analyses were conducted with the cleaned dataset.

sleep_data<-rename(sleep_data,Total.sleep.time=Time.in.bed) #change the name of 
#the column of time.in.bed to total.sleep.time since it was checked that they have the same 
#meaning in this dataset in the previous step. 

#convert date-time format to time format as time information is only used for the further analyses. 
sleep_data$Start<-as.POSIXct(sleep_data$Start, "%Y-%m-%d %H:%M:%S", tz = "EST5EDT" )
sleep_data$Start <- format(sleep_data$Start,  format = "%H:%M:%S")

sleep_data$End<-as.POSIXct(sleep_data$End, "%Y-%m-%d %H:%M:%S",  tz = "EST5EDT")
sleep_data$End <- format(sleep_data$End,  format = "%H:%M:%S")

knitr::opts_chunk$set(echo = TRUE)

GOALS OF THIS PROJECT

My ultimate goal in the project is to see relationships between sleep features and daily features. My current goal is 1) to see what daily features (e.g., drinking coffee) affect sleep features (quality, duration and heart rate during sleep) and sleep onset/offset time. 2) In addition, I will see how the sleep features are related each other. 3) Lastly, I will see if sleep features and sleep onset/offset time affect mood at awake.

VISUALIZATION: METHODS, RESULTS and SUMMARY IN EACH CATEGORY

1. What daily features (e.g., drinking coffee) affect sleep features (quality, duration and heart rate during sleep) and sleep onset/offset time.

1.1 daily features ->  sleep features?

#Bring needed information columns
sleep_data1_1 <- sleep_data[ , c(3,4,7:11)]

###############################sleep equality#################################
# Create new_sleep for sleep quality
set.seed(112)
new_sleep<- matrix(0,4,4)
colnames(new_sleep) <- c("90~100","80~90","70~80","<70")
rownames(new_sleep) <- c("drinking coffee","drinking tea","stress","working out")

for (y in 4:7) {
  y1<-y-3
for (x in 1:162) {
    if(sleep_data1_1[x,1]> 90 & sleep_data1_1[x,1] <= 100 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep[y1,1]=new_sleep[y1,1]+1; #adding 1 if sleep quality is between 90 and 100
          }
        }
    
      else if(sleep_data1_1[x,1]> 80 & sleep_data1_1[x,1] <= 90 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep[y1,2]=new_sleep[y1,2]+1; #adding 1 if sleep quality is between 80 and 90
          }
        
      }
  
   else if(sleep_data1_1[x,1]> 70 & sleep_data1_1[x,1] <= 80 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep[y1,3]=new_sleep[y1,3]+1; #adding 1 if sleep quality is between 70 and 80
          }
        
   }
  
     else if(sleep_data1_1[x,1] <= 70 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep[y1,4]=new_sleep[y1,4]+1; #adding 1 if sleep quality is less than or equal to 70
          }
        
      }
}
}

###############################Total sleep amount###############################
# Create new_sleep1 for total sleep amount
set.seed(112)
new_sleep1<- matrix(0,4,4)
colnames(new_sleep1) <- c(">8","7~8(typical)","5~7","<=5")
rownames(new_sleep1) <- c("drinking coffee","drinking tea","stress","working out")

for (y in 4:7) {
  y1<-y-3
for (x in 1:162) {
    if(sleep_data1_1[x,2]> 480) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep1[y1,1]=new_sleep1[y1,1]+1; #adding 1 if sleep amount (hours) is more than 8 hours
          }
        }
    
      else if(sleep_data1_1[x,2]>=420 & sleep_data1_1[x,2] <= 480 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep1[y1,2]=new_sleep1[y1,2]+1; #adding 1 if sleep amount (hours) is between 7 and 8 hours
          }
        
      }
  
   else if(sleep_data1_1[x,2]>= 300 & sleep_data1_1[x,2] < 420 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep1[y1,3]=new_sleep1[y1,3]+1;  #adding 1 if sleep amount (hours) is between 5 and 7 hours
          }
        
   }
  
     else if(sleep_data1_1[x,2] < 300 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep1[y1,4]=new_sleep1[y1,4]+1; #adding 1 if sleep amount (hours) is less than or equal to 5 hours
          }
        
      }
}
}
###############################Heart rate######################################
# Create new_sleep2 for heart rate
set.seed(112)
new_sleep2<- matrix(0,4,4)
colnames(new_sleep2) <- c(">80","50~80","40~50(typical)","<40")
rownames(new_sleep2) <- c("drinking coffee","drinking tea","stress","working out")

for (y in 4:7) {
  y1<-y-3
for (x in 1:162) {
    if(sleep_data1_1[x,3]> 80 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep2[y1,1]=new_sleep2[y1,1]+1; #adding 1 if heart rate (bpm) is more than 80
          }
        }
    
      else if(sleep_data1_1[x,3]> 50 & sleep_data1_1[x,3] <= 80 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep2[y1,2]=new_sleep2[y1,2]+1; #adding 1 if heart rate (bpm) is between 50 and 80
          }
        
      }
  
   else if(sleep_data1_1[x,3]> 45 & sleep_data1_1[x,3] <= 50 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep2[y1,3]=new_sleep2[y1,3]+1; #adding 1 if heart rate (bpm) is between 40 and 50
          }
        
   }
  
     else if(sleep_data1_1[x,3] <=40 ) {
          if(sleep_data1_1[x,y]=="Yes") {
            new_sleep2[y1,4]=new_sleep2[y1,4]+1; #adding 1 if heart rate (bpm) is less than or equal to 40
          }
        
      }
}
}

#######################################plot#####################################
# Grouped barplot
barplot(new_sleep, 
        border="white", 
        font.axis=2, 
        beside=T, 
        col = 1:nrow(new_sleep),
        legend.text = TRUE, 
        args.legend = list(x = "topleft",
                           inset = c(- 0.001, 0)), 
        xlab="Sleep quality(%)", 
        ylab="Numbers", 
        font.lab=2)

# Grouped barplot
barplot(new_sleep1, 
        border="white", 
        font.axis=2, 
        beside=T, 
        col = 1:nrow(new_sleep1),
        legend.text = TRUE, 
        args.legend = list(x = "topright",
                           inset = c(- 0.001, 0)), 
        xlab="Total sleep time(Hrs)", 
        ylab="Numbers", 
        font.lab=2)

# Grouped barplot
barplot(new_sleep2, 
        border="white", 
        font.axis=2, 
        beside=T, 
        col = 1:nrow(new_sleep2),
        legend.text = TRUE, 
        args.legend = list(x = "topright",
                           inset = c(- 0.001, 0)), 
        xlab="Heart rate(BPM)", 
        ylab="Numbers", 
        font.lab=2)

1.1

-Method: I used histogram method to show the effects of each daily feature on each sleep feature. First, I divided each sleep feature to different categories such as good, normal and etc. In this step, I generated new_data,new_data1 and new_data2 for each of sleep features, which has the counted number of participants who were satisfied with each condition. For example, in sleep quality, if a participant had sleep quality which was above 90 percent and had coffee, 1 is added to the corresponding element. This was because the ultimate goal of this analysis is to show approximately good ranges of each category. For instance, sleep quality is considered to be good if it is allocated into 90~100%. Even though sleep quality is 90% not 100%, it is not usually regarded as bad sleep. Likewise, I made different categories indicating normal(i.e.,typical or good) range and outside of the range in order to show how many people are belonged to good range in each category and showed what daily features are related to each category.

-Result:

1) Sleep quality: There seems that less stress leads to better sleep quality. Drinking more coffee and tea show the effects on poor sleep by having inversely proportional to sleep quality.

2) Total time sleep: It looks trivial, but working out slightly plays a role in longer sleeping time. Other variables don’t show meaningful effects on this.

3) Heart rate: Significantly, having coffee, tea, stress, and working out on the day of sleep shows more increased heart rate during sleep based on the known information of normal heart rate range during sleep (40~50 bpm).

4) Overall, the number of people who worked out are low compared to other daily features in the three categories, indicating that a few people had working out on the day of the sleep in this dataset and did not affect meaningful results in the three sleep features, respectively.

-Summary:

1) less stress, coffee and tea -> better-quality sleep

2) working out -> more time to sleep

3) more coffee, tea, stress and working out -> higher heart rate

1.2 daily features -> sleep onset/offset time?

sleep_data1_2<-sleep_data[ , c(1,2,8:11)]

#change the string to time format for the columns of sleep start time and sleep end time
sleep_data1_2$Start<-strptime(sleep_data1_2$Start, format ="%H:%M:%S")
sleep_data1_2$Start<-as.POSIXct(sleep_data1_2$Start, format ="%%H:%M:%S") 
sleep_data1_2$End<-strptime(sleep_data1_2$End, format ="%H:%M:%S")
sleep_data1_2$End<-as.POSIXct(sleep_data1_2$End, format ="%H:%M:%S") 

# relationship between sleep onset and daily features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=Start)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    theme(legend.position="none")+
    xlab("coffee_state") + 
    ylab("sleep onset")
  
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=Start)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("tea_state") + 
    ylab("sleep onset")

p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=Start)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("workingout_state") + 
    ylab("sleep onset")
  
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=Start)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("stress_state") + 
    ylab("sleep onset")

# Display both charts side by side with the patchwork package
p1 + p2 + p3 +p4

# relationship between sleep onset and daily features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=End)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("coffee_state") + 
    ylab("sleep offset")
  
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=End)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("tea_state") + 
    ylab("sleep offset")

p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=End)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("workingout_state") + 
    ylab("sleep offset")
  
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=End)) +
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    xlab("stress_state") + 
    ylab("sleep offset")

# Display both charts side by side with the patchwork package
p1 + p2 + p3 +p4

1.2

-Method: First, I converted the string of start and end (sleep onset and offset time) to time format using functions of strptime() and as.POSIXct(). I used boxplot() to show the effects of daily features on sleep onset and offset time.

-Result:

1) Sleep onset time from daily features: No significant differences of any daily features on sleep onset time was observed.

2) Sleep offset time from daily features: Drinking coffee showed slightly later time to wake up compared to those who did not have coffee on the sleeping day. Having stress on the day of sleep showed big difference in time to wake up compared to those who did not have stress on the day, indicating that stress may play a role in delaying time to wake up.

-Summary:

1) Coffee -> slight late time to wake up

2) Stress -> late time to wake up

1.3 The relationship between sleep start time and sleep end time

sleep_data1_3 <-sleep_data[ , c(1,2,5,8:11)]

#using different function, chron function, to show time in the relationship between sleep 
#onset and offset; The scale of 24 hours are converted to 1 as maximal value. 
sleep_data1_3$chrons<-chron(times=sleep_data1_3$Start)
sleep_data1_3$chrone<-chron(times=sleep_data1_3$End)

#taking hour of time only for the relationship between sleep onset and offset
sleep_data1_3$start_h<-as_hms(sleep_data1_3$Start) %>% hour
sleep_data1_3$end_h<-as_hms(sleep_data1_3$End) %>% hour

# scatter plot from time (including minutes)
ggplot(data = sleep_data1_3, mapping = aes(x = chrons, y = chrone)) +
  geom_point() +
  xlab("sleep start time") + 
  ylab("wake-up time")+
  ggtitle("sleep onset vs. sleep offset (0~24=>0~1)")+
  geom_smooth()

# scatter plot from hour only of the time
ggplot(data = sleep_data1_3, mapping = aes(x = start_h, y = end_h)) +
  geom_point() +
  xlab("sleep start time") + 
  ylab("wake-up time")+
  ggtitle("sleep onset vs. sleep offset (0~24)")+
  geom_smooth()

1.3

-Method: At first, I converted the time to hour format using as_hms() since detailed information (i.e., minutes) did not show big meaningful difference from the data that only have hour information only.

-Result: From the hour-format result using the scatter plot, it was observed that people woke up before 10 am regardless of time to fall asleep (except for one person who woke up at night), indicating that time to fall asleep did not affect total sleep time.

-Summary:

1) No effects of sleep onset time on sleep offset time

2) Most of people woke up before 10 am regardless of sleep onset time (except for 1 person).

1.4 coffee and tea -> sleep features & stress -> sleep offset time?

############Sleep features with coffee and/or tea#############################
ggplot(sleep_data, aes(fill=coffee_state, y=Sleep.quality, x=coffee_state)) + 
    geom_bar(position="dodge", stat="identity") +
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("           Sleep quality depending on coffee and tea intake") +
    facet_wrap(~tea_state) +
    #facet_wrap(~coffee_state)+
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Coffee state") +
    ylab("Sleep quality")

ggplot(sleep_data, aes(fill=coffee_state, y=Total.sleep.time, x=coffee_state)) + 
    geom_bar(position="dodge", stat="identity") +
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("      Total sleep time depending on coffee and tea intake") +
    facet_wrap(~tea_state) +
    #facet_wrap(~coffee_state)+
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Coffee state") +
    ylab("Total sleep time")

ggplot(sleep_data, aes(fill=coffee_state, y=Heart.rate, x=coffee_state)) + 
    geom_bar(position="dodge", stat="identity") +
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("           Heart rate depending on coffee and tea intake") +
    facet_wrap(~tea_state) +
    #facet_wrap(~coffee_state)+
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Coffee state") +
    ylab("Heart rate")

#####Sleep offset time in terms of stress level in different daily features#####

ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) + 
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("Sleep offset time depending on stress and coffee intake") +
    facet_wrap(~coffee_state) +
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Stress state") +
    ylab("Sleep offset time")

ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) + 
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("Sleep offset time depending on stress and tea intake") +
    facet_wrap(~tea_state) +
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Stress state") +
    ylab("Sleep offset time")

ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) + 
    geom_boxplot(fill="slateblue", alpha=0.2) + 
    scale_fill_viridis(discrete = T, option = "E") +
    ggtitle("Sleep offset time depending on stress and working-out state") +
    facet_wrap(~working_out_state) +
    theme_ipsum() +
    theme(legend.position="none") +
    xlab("Stress state") +
    ylab("Sleep offset time")

1.4

-Method: Further analyses using coffee and tea were conducted based on previous results (1.1~1.3) using facet_wrap() function. Since the number of those who had coffee and tea were almost same in each category of each sleep characteristic, further analysis was made by using geom_bar() function. In addition, stress seems to play a role in delaying the time to wake up compared to those who had the absence of stress on the day of sleep, which made me try to see the effects of stress on sleep offset time depending on yes or no in other daily features(1.2) using boxplot() function.

-Result:

1) Sleep quality: This doesn’t show any meaningful result. Sleep quality seems to be not affected by tea or/and coffee.

2) Total sleep time: This shows that participants slept the most when they took no coffee but tea. In contrast, total sleep time was the least when they didn’t take any coffee or tea.

3) Heart rate: Heart rate during sleep appears to be increased and far outside the normal range on the day the participants drank coffee regardless of tea intake.

4) Sleep offset time with stress and coffee intake: Having stress showed slightly more delayed sleep offset time regardless of taking coffee.

5) Sleep offset time with stress and tea intake: Having stress showed a way more delayed time to wake up compared to those who were with tea and having stress.

6) Sleep offset time with stress and working-out state: Having stress showed slightly more delayed time to wake up with stress regardless of working out. Stress without working out slightly more affected delayed sleep offset time.

-Summary:

1) Increased total sleep without coffee and tea

2) Decreased heart rate without coffee regardless of tea

3) Delayed sleep offset time with stress in all cases of daily features, by showing the most effects of stress with no tea on the delayed time to fall asleep

Relationship among sleep features

2.1 Any relationship among sleep features?

sleep_data2 <- sleep_data[ , c(3,4,7)]
ggpairs(sleep_data2, title="2.Correlationship among sleep features")

#relationship between time in bed(TST) and sleep quality
ggplot(data = sleep_data, mapping = aes(x = Total.sleep.time, y = Sleep.quality)) +
  geom_point() +
  geom_smooth()

2.1

-Method: ggpairs() was used to see relationship among sleep features for each pair. Based on this, the strong relationship (corr: 0.722) between total sleep time and sleep quality was shown.

-Result & Summary: There is a strong correlation between total sleep time and sleep quality (corr: 0.722).

3. sleep features and sleep onset/offset time affect mood at awake

3.1 sleep features affect mood at awake?

sleep_data3_1 <-sleep_data[ , c(1,2,3,4,5,7)]

# relationship between sleep onset and features
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Sleep.quality, fill=Mood.at.awake)) + 
  # fill=name allow to automatically dedicate a color for each group
  geom_violin() +
  ggtitle("Quality") +
  theme_ipsum()

ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Total.sleep.time, fill=Mood.at.awake)) + 
  # fill=name allow to automatically dedicate a color for each group
  geom_violin() +
  ggtitle("Total sleep time") +
  theme_ipsum()

ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Heart.rate, fill=Mood.at.awake)) + 
  # fill=name allow to automatically dedicate a color for each group
  geom_violin() +
  ggtitle("Heart rate") +
  theme_ipsum()

# Display both charts side by side thanks to the patchwork package
#p1 + p2 +p3

3.1

-Method: The mood at awake in each category was shown with geom_violin() function.

-Result & Summary: No significant outcomes/differences between different emotions when awake were seen with any of sleep features.

3.2 Sleep onset/offset affects mood at awake?

##############################Mood at awake######################################

###############################Sleep Start######################################

#Plot
sleep_data1_3 %>%
  mutate(text = fct_reorder(Mood.at.awake, start_h)) %>%
  ggplot(aes(y=Mood.at.awake, x=start_h,  fill=text)) +
    geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
    theme_ridges() +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      strip.text.x = element_text(size = 8)
    ) +
    xlab("Sleep Start Time (0~24)") +
    ylab("Mood at awake")

###############################Sleep End######################################

#Plot
sleep_data1_3 %>%
  mutate(text = fct_reorder(Mood.at.awake, end_h)) %>%
  ggplot(aes(y=Mood.at.awake, x=end_h,  fill=Mood.at.awake)) +
    geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
    theme_ridges() +
    theme(
      legend.position="none",
      panel.spacing = unit(0.1, "lines"),
      strip.text.x = element_text(size = 8)
    ) +
    xlab("Sleep End Time(0~24)") +
    ylab("Mood at awake")

3.2

-Method:I used ggplot() function to plot the effects of sleep onset/offset time on mood at awake and x axis indicates hours from 0 to 24.

-Result & Summary:

1) It is shown that later time to fall asleep is likely to make people feel bad or good, which indicates that later time to fall asleep affects mood at people at awake in a different way.

2) On the other hand, it is shown that early time to wake up affects most of participants to feel good or bad at awake.

3.3 total sleep time and sleep quality -> mood at awake?

#Mood at awake with time in bed(TST) and sleep quality
ggplot(data = sleep_data, mapping = aes(x = Total.sleep.time, y = Sleep.quality)) +
  geom_point(mapping = aes(color = Mood.at.awake)) +
  geom_smooth()

mm<-table(sleep_data$Mood.at.awake)
as.data.frame(table(sleep_data$Mood.at.awake))

  Var1 Freq
1   :)  147
2   :|   15

-Method: I used ggplot() function with scatter plot because I sometimes feel either sleep amount or sleep quality affects mood at awake. So, I wanted to see if both or either one affects mood at awake. The number of people who felt good at awake was shown with table() as the next step.

-Result & Summary: Overall, more people (n=147) felt good mood at awake compared to those who felt bad at awake (n=15) and no significant relationship between sleep features (such as sleep quality or total sleep time) and mood at awake was observed.

REFLECTION

Using R is necessary in the lab that I am belonged to for data analysis. But, as I haven’t used this before, I wanted to learn it. At first, everything was unfamiliar with me. However, as I took the lectures and saw other classmates’ works, I got motivated and learn a lot from them. I am sure that knowing various functions and applying them to the homeworks will be super helpful for me. Especially, through this final project, I got to know that citation of packages of library is required, which is really important for me to memorize.
I had hard time to figure out how to plot time (hours) for axes in some graphs. As times were written as char type, it did not work when I tried to plot it in some graphs. Luckily, after spending a few weeks and trying various functions, I was able to figure it out. I wish I would have known “as_hms” and “chron” functions a little bit earlier, which were what I was exactly looking for. I just checked that, a few days ago, a new file of sleep information in the website that I imported the dataset I used for this project was updated with additional information. I saw that the new dataset had information that I wanted to get at first, which was the actual sleep time. Because it was updated a few weeks ago and I just realized it a few days ago, I was not able to use the new data for the final project. I already made many process with my current dataset. Therefore, I would like to use the updated sleep information for extended analyses in addition to my current analyses. Also, I saw a lot of great projects from other classmates, which had different topics and used different functions in R. Therefore, I want to see theirs and try to use some functions that I have not used in my current project, which will make me know more functions and algorithms.

CONCLUSION

All in all, this project shows how daily features can affect sleep features. I was able to observe

Having less stress, coffee and tea showed higher sleep quality from those who had sleep quality above 70%.
Total sleep amount is likely to be affected by working-out state on the day of sleep: The higher number of people who worked out showed more time to sleep.
Having coffee, tea and stress is shown to affect increased heart rate during sleep (beyond normal range: 40-50bpm). Working out may affect this as well, but its effect looks trivial.
Stressed people on the day of sleep showed sleep offset time that was more delayed compared to those who had no stress. Other meaningful difference was not observed in any daily features with sleep onset/offset time.
Sleep offset time (time to wake up) is consistent regardless of sleep onset time (time to fall asleep).

   a.   Due to the similar number of those who had coffee and tea in each category of sleep 
   features, further analysis with facet_wrap() was conducted by controlling one effect 
   (e.g., coffee) to see the other effect (e.g., tea). Total sleep time was maximized 
   without coffee but with tea. However, the averaged time of sleep was the minimal in 
   the participants who had no tea and coffee. Heart rate was increased with coffee 
   regardless of having tea or not. 

   b.   Stress showed an effect on sleep offset time in 4). Thus, the effect of stress 
   on sleep offset time was further analyzed when other daily habits were controlled. 
   Stress slightly delayed sleep offset time regardless of any daily habits except for
   having tea. Those who had stress without tea on the day of sleep showed a way more
   delayed time to wake up compared to those who had tea on the stressful day.

Total sleep time and sleep quality showed a correlation (r=0.722), which means that more sleep is strongly connected with sleep quality.
Mood at awake: Bad mood at awake seems to be affected by early/late time to sleep or wake up in contrast to sleep features which do not affect mood at awake at all.
There are more people who had good mood when they woke up. Not significant difference features between those who had good mood and those who had bad mood in sleep quality and total sleep time.

Conclusion: Daily features such as coffee and stress are shown to affect humans’ sleep based on the results I had here, indicating that different patterns before having sleep lead to different sleep by either weakening or strengthening sleep quality, time and condition (i.e., heart rate in this project). Also, mood at awake was shown to be affected by sleep onset/offset time. Different genes and other factors may affect different sleep patterns as well, which will be more interesting to be considered together later.

BIBLIOGRPAHY

DANA DIOTTE. sleepdata,csv. Retrieved on 03/05/2022 from https://www.kaggle.com/datasets/danagerous/sleep-data.

Ameen, Mohamed S et al. “About the Accuracy and Problems of Consumer Devices in the Assessment of Sleep.” Sensors (Basel, Switzerland) vol. 19,19 4160. 25 Sep. 2019, doi:10.3390/s19194160

RStudio Team (2022). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA, http://www.rstudio.com/.

Hadley Wickham, Jim Hester and Jennifer Bryan (2022). readr: Read Rectangular Text Data. R package version 2.1.2. https://CRAN.R-project.org/package=readr

Wickham et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.8. https://CRAN.R-project.org/package=dplyr

Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Bob Rudis (2020). hrbrthemes: Additional Themes, Theme Components and Utilities for ‘ggplot2’. R package version 0.8.0. https://CRAN.R-project.org/package=hrbrthemes

Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2021). Rvision - Colorblind-Friendly Color Maps for R. R package version 0.6.2.

Claus O. Wilke (2021). ggridges: Ridgeline Plots in ‘ggplot2’. R package version 0.5.3. https://CRAN.R-project.org/package=ggridges

Wickham, H. (2021). factors: Tools for Working with Categorical Variables (Factors). R package version 0.5.1. https://CRAN.R-project.org/package=forcats

Thomas Lin Pedersen (2020). patchwork: The Composer of Plots. R package version 1.1.1. https://CRAN.R-project.org/package=patchwork

Dean Attali and Christopher Baker (2022). ggExtra: Add Marginal Histograms to ‘ggplot2’, and More ‘ggplot2’ Enhancements. R package version 0.10.0. https://CRAN.R-project.org/package=ggExtra

David James and Kurt Hornik (2020). chron: Chronological Objects which Can Handle Dates and Times. R package version 2.3-56.

Erich Neuwirth (2022). RColorBrewer: ColorBrewer Palettes. R package version 1.1-3. https://CRAN.R-project.org/package=RColorBrewer

Kirill Müller (2021). hms: Pretty Time of Day. R package version 1.1.1. https://CRAN.R-project.org/package=hms

Wickham, H. & Grolemund, G. (n.d.). R for data science [eBook edition]. O’Reilly. https://r4ds.had.co.nz/index.html

Comment on this article Share:

Final_Project (2nd), UPDATED VERSION

INTRODUCTION

DATA

GOALS OF THIS PROJECT

VISUALIZATION: METHODS, RESULTS and SUMMARY IN EACH CATEGORY

REFLECTION

CONCLUSION

BIBLIOGRPAHY

Reuse

Citation