(draft for the final project) This is HW6 using sleep data to show the effects of daily features on sleep duration, sleep quality and heart rate during sleep
<Sleep project-how daily features are related to our sleep?>
The data is from Kaggle (https://www.kaggle.com/danagerous/sleep-data). Sleep was recorded from a Swedish application (iOS) from 180 subjects. 18 people are excluded due to missing information for some categories (i.e., total participants = 162).
variables:
sleep_data = select(sleep_data, 1:7) #excluded the variable of "activity steps",
#which was the last variable (8th) as many subjects didn't include this information
sleep_data<-sleep_data %>%
drop_na(Sleep.quality,Sleep.Notes,Mood.at.awake,Heart.rate) #excluded rows that have blank for some variables
#convert hours to minutes for the column of "time in bed" in order to compare it
#with total sleep time that was the column created by subtracting sleep start time from sleep end time.
times<-as.POSIXlt(sleep_data$Time.in.bed, format="%H:%M")
sleep_data$Time.in.bed<-times$hour*60+times$min
#changing the percentage character of the column of "sleep quality" to numeric
sleep_data$Sleep.quality<-as.numeric(sub("%","",sleep_data$Sleep.quality))
#mutate: Sleep.Notes are divided to columns corresponding to "Drank coffee",
#"Drank tea", "Worked out" and "Stressful day"in the variable named states for
#each for further analyses.
sleep_data<-sleep_data %>%
mutate(coffee_state = case_when(
grepl("Drank coffee",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Drank coffee",Sleep.Notes) == FALSE ~'No' #no
)) %>%
mutate(tea_state = case_when(
grepl("Drank tea",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Drank tea",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(working_out_state = case_when(
grepl("Worked out",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Worked out",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(stress_state = case_when(
grepl("Stressful day",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Stressful day",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(TST_mins = as.integer(difftime(End,Start)))
head(arrange(sleep_data, Sleep.quality))
Start End Sleep.quality Time.in.bed
1 2014-12-30 21:17:50 2014-12-30 21:33:54 3 16
2 2015-01-19 05:06:38 2015-01-19 06:20:29 16 73
3 2015-06-05 03:45:52 2015-06-05 05:41:01 23 115
4 2015-05-06 21:47:25 2015-05-07 05:21:38 50 454
5 2015-04-28 21:41:45 2015-04-29 05:00:17 53 438
6 2015-03-04 20:53:47 2015-03-05 06:13:31 54 559
Mood.at.awake Sleep.Notes
1 :| Stressful day
2 :)
3 :)
4 :) Ate late:Drank coffee:Drank tea:Worked out
5 :) Drank coffee:Worked out
6 :) Drank coffee:Drank tea:Stressful day:Worked out
Heart.rate coffee_state tea_state working_out_state stress_state
1 72 No No No Yes
2 58 No No No No
3 57 No No No No
4 59 Yes Yes Yes No
5 59 Yes No Yes No
6 68 Yes Yes Yes Yes
TST_mins
1 16
2 73
3 115
4 454
5 438
6 559
#I wanted to save the cleaned data for in case.
#save(sleep_data, file = "sleep_data.csv")
write.table(sleep_data, file = "sleep_data.csv",
sep = "\t", row.names = F)
1. What daily features (e.g., drinking coffee) affect sleep features (quality, duration and heart rate during sleep) and sleep onset/offset time.
1.1 daily features -> sleep features?
#Bring needed information columns
sleep_data1_1 <- sleep_data[ , c(3,4,7:11)]
###############################sleep equality#################################
# Create new data for sleep quality
set.seed(112)
new_sleep<- matrix(0,4,4)
colnames(new_sleep) <- c("90~100","80~90","70~80","<70")
rownames(new_sleep) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,1]> 90 & sleep_data1_1[x,1] <= 100 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,1]=new_sleep[y1,1]+1;
}
}
else if(sleep_data1_1[x,1]> 80 & sleep_data1_1[x,1] <= 90 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,2]=new_sleep[y1,2]+1;
}
}
else if(sleep_data1_1[x,1]> 70 & sleep_data1_1[x,1] <= 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,3]=new_sleep[y1,3]+1;
}
}
else if(sleep_data1_1[x,1] <= 70 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,4]=new_sleep[y1,4]+1;
}
}
}
}
###############################Total sleep amount###############################
# Create new data for sleep quality
set.seed(112)
new_sleep1<- matrix(0,4,4)
colnames(new_sleep1) <- c(">8","7~8(typical)","5~7","<=5")
rownames(new_sleep1) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,2]> 480) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,1]=new_sleep1[y1,1]+1;
}
}
else if(sleep_data1_1[x,2]>=420 & sleep_data1_1[x,2] <= 480 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,2]=new_sleep1[y1,2]+1;
}
}
else if(sleep_data1_1[x,2]>= 300 & sleep_data1_1[x,2] < 420 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,3]=new_sleep1[y1,3]+1;
}
}
else if(sleep_data1_1[x,2] < 300 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,4]=new_sleep1[y1,4]+1;
}
}
}
}
###############################Heart rate######################################
# Create new data for sleep quality
set.seed(112)
new_sleep2<- matrix(0,4,4)
colnames(new_sleep2) <- c(">80","50~80","40~50(typical)","<40")
rownames(new_sleep2) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,3]> 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,1]=new_sleep2[y1,1]+1;
}
}
else if(sleep_data1_1[x,3]> 50 & sleep_data1_1[x,3] <= 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,2]=new_sleep2[y1,2]+1;
}
}
else if(sleep_data1_1[x,3]> 45 & sleep_data1_1[x,3] <= 50 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,3]=new_sleep2[y1,3]+1;
}
}
else if(sleep_data1_1[x,3] <=40 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,4]=new_sleep2[y1,4]+1;
}
}
}
}
#######################################plot#####################################
# Grouped barplot
barplot(new_sleep,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep),
legend.text = TRUE,
args.legend = list(x = "topleft",
inset = c(- 0.001, 0)),
xlab="Sleep quality(%)",
ylab="Numbers",
font.lab=2)
# Grouped barplot
barplot(new_sleep1,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep1),
legend.text = TRUE,
args.legend = list(x = "topright",
inset = c(- 0.001, 0)),
xlab="Total sleep time(Hrs)",
ylab="Numbers",
font.lab=2)
# Grouped barplot
barplot(new_sleep2,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep2),
legend.text = TRUE,
args.legend = list(x = "topright",
inset = c(- 0.001, 0)),
xlab="Heart rate(BPM)",
ylab="Numbers",
font.lab=2)
1.1
-Method: I used histogram method to show the effects of each daily feature on each sleep feature. First, I divided each daily feature to different categories such as good, normal and etc. This was because the ultimate goal of this analysis is to show approximate range of each category. For example, sleep quality is considered to be good if it is allocated into 90~100%. Even though sleep quality is 90%, it is not usually said not a good sleep. Likewise, I made different categories indicating normal(i.e.,typical) range and outside of the range in order to introduce that this analysis intends to show how many people are belonged to similar range as specific figure of sleep quality or other sleep features are not meaningful in here.
-Result: Overall, the number of people who work out are low compared to other daily features in the three categories, indicating that a few people had working out on the day of the sleep and did not affect meaningful results in the three sleep features, respectively.
1) Sleep quality: There seems that less stress leads to better sleep quality. Drinking more coffee and tea show the effects on poor sleep even though they are inversely proportional with sleep quality.
2) Total time sleep: It looks trivial, but working out slightly plays a role in longer sleeping time. Other variables don’t show meaningful effects on this.
3) Heart rate: Significantly, having coffee, tea, stress, and working out on the day of sleep shows more increased heart rate during sleep based on the known information of normal heart rate range during sleep (40~50 bpm).
-Summary:
1) less stress, coffee and tea -> better-quality sleep
2) working out -> more time to sleep
3) more coffee, tea, stress and working out -> higher heart rate
1.2 daily features -> sleep onset/offset time?
sleep_data1_2<-sleep_data[ , c(1,2,8:11)]
sleep_data1_2$Start<-strptime(sleep_data1_2$Start, format ="%H:%M:%S")
sleep_data1_2$Start<-as.POSIXct(sleep_data1_2$Start, format ="%%H:%M:%S")
sleep_data1_2$End<-strptime(sleep_data1_2$End, format ="%H:%M:%S")
sleep_data1_2$End<-as.POSIXct(sleep_data1_2$End, format ="%H:%M:%S")
# relationship between sleep onset and features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
theme(legend.position="none")+
xlab("coffee_state") +
ylab("sleep onset")
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("tea_state") +
ylab("sleep onset")
p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("workingout_state") +
ylab("sleep onset")
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("stress_state") +
ylab("sleep onset")
# Display both charts side by side thanks to the patchwork package
p1 + p2 + p3 +p4
# relationship between sleep onset and features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("coffee_state") +
ylab("sleep offset")
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("tea_state") +
ylab("sleep offset")
p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("workingout_state") +
ylab("sleep offset")
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("stress_state") +
ylab("sleep offset")
# Display both charts side by side thanks to the patchwork package
p1 + p2 + p3 +p4
1.2
-Method: First, I converted the string of start and end (sleep onset and offset time) to time format using functions of strptime() and as.POSIXct(). I used boxplot() to show the effects of daily features on sleep onset and offset time.
-Result:
1) Sleep onset time from daily features: No significant difference of any daily features on sleep onset time was observed.
2) Sleep offset time from daily features: Drinking coffee showed slightly later time to wake up compared to those who do not have coffee on the sleeping day. Having stress on the day to have sleep showed big difference in time to wake up compared to those who did not have sleep on the day, indicating that stress may play a big role in time to wake up.
-Summary:
1) Coffee -> slight late time to wake up
2) Stress -> late time to wake up
1.3 The relationship between sleep start time and sleep wake-up time
sleep_data1_3 <-sleep_data[ , c(1,2,5,8:11)]
#using chron function to show time (including minutes) in the relationship between sleep onset and offset
sleep_data1_3$chrons<-chron(times=sleep_data1_3$Start)
sleep_data1_3$chrone<-chron(times=sleep_data1_3$End)
#taking hour of time only for the relationship between sleep onset and offset
sleep_data1_3$start_h<-as_hms(sleep_data1_3$Start) %>% hour
sleep_data1_3$end_h<-as_hms(sleep_data1_3$End) %>% hour
# scatter plot from time (including minutes)
ggplot(data = sleep_data1_3, mapping = aes(x = chrons, y = chrone)) +
geom_point() +
xlab("sleep start time") +
ylab("wake-up time")+
ggtitle("sleep onset vs. sleep offset (0~24=>0~1)")+
geom_smooth()
# scatter plot from hour only of the time
ggplot(data = sleep_data1_3, mapping = aes(x = start_h, y = end_h)) +
geom_point() +
xlab("sleep start time") +
ylab("wake-up time")+
ggtitle("sleep onset vs. sleep offset (0~24)")+
geom_smooth()
1.3
-Method: At first, I convert the times to hour format since detailed information (i.e., minutes) did not show big meaningful difference from using showing hour information only.
-Result: Using the hour-format result from the scatter plot, it was observed that people wake up before 10 am except for one person regardless of time to fall asleep, indicating that time to fall asleep did not affect total sleep time either from this result.
-Summary:
1) No effects of sleep onset time on sleep offset time
2) Most of people woke up before 10 am regardless of sleep onset time (except for 1 person).
1.4 coffee and tea -> sleep features simultaneously & stress -> sleep offset time?
############Sleep features with coffee and/or tea#############################
ggplot(sleep_data, aes(fill=coffee_state, y=Sleep.quality, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Sleep quality depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Sleep quality")
ggplot(sleep_data, aes(fill=coffee_state, y=Total.sleep.time, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Total sleep time depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Total sleep time")
ggplot(sleep_data, aes(fill=coffee_state, y=Heart.rate, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Heart rate depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Heart rate")
#####Sleep offset time in terms of stress level in different daily features#####
ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle("Sleep offset time depending on stress and coffee intake") +
facet_wrap(~coffee_state) +
theme_ipsum() +
theme(legend.position="none") +
xlab("Stress state") +
ylab("Sleep offset time")
ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle("Sleep offset time depending on stress and tea intake") +
facet_wrap(~tea_state) +
theme_ipsum() +
theme(legend.position="none") +
xlab("Stress state") +
ylab("Sleep offset time")
ggplot(sleep_data1_3, aes(fill=stress_state, y=end_h, x=stress_state)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle("Sleep offset time depending on stress and working-out state") +
facet_wrap(~working_out_state) +
theme_ipsum() +
theme(legend.position="none") +
xlab("Stress state") +
ylab("Sleep offset time")
1.4
-Method: Further analyses were conducted based on previous results (1.1~1.3) using facet_wrap() function. Since the number of coffee and tea were almost same in each category of each sleep characteristic, further analysis of coffee and tea was made by using geom_bar() function. In addition, stress seems to play a role in delaying the time to wake up compared to the time after having no stress, which made me try to see the effects of stress on sleep offset time depending on yes or no in other daily features(1.2) using boxplot() function.
-Result:
1) Sleep quality: This doesn’t show any meaningful result. Sleep quality seems to be not affected by tea or coffee.
2) Total sleep time: This shows that participants slept the most when they took no coffee but tea. On the other way, total sleep time was the least when they didn’t take any coffee or tea.
3) Heart rate: Heart rate during sleep appears to be in far outside the normal range on the day the participants drank coffee regardless of tea intake.
4) Sleep offset time with stress and coffee intake: Having stress showed slightly more delayed sleep offset time regardless of taking coffee.
5) Sleep offset time with stress and tea intake: Having stress showed a way more delayed time to wake up compared to the time with tea and having stress.
6) Sleep offset time with stress and working-out state: Having stress showed slightly more delayed time to wake up compared to the time without stress. Stress without working out slightly more affected delayed sleep offset time.
-Summary:
1) Increased total sleep without coffee and tea
2) Decreased heart rate without coffee regardless of tea
3) delayed sleep offset time with stress in all cases of daily features, by showing the most effects of stress with no tea on the delayed time to fall asleep
2.1 Any relationship among sleep features?
sleep_data2 <- sleep_data[ , c(3,4,7)]
ggpairs(sleep_data2, title="3.Corelationship among sleep features")
#relationship between time in bed(TST) and sleep quality
ggplot(data = sleep_data, mapping = aes(x = Total.sleep.time, y = Sleep.quality)) +
geom_point() +
geom_smooth()
2.1
-Method: ggpairs() was used to see relationship among sleep features for each pair. Based on this, the strong relationship (corr: 0.722) between total sleep time and sleep quality was shown. This was presented with geom_smooth()function.
-Result & Summary: There is a strong correlation between total sleep time and sleep quality (corr: 0.722).
3. sleep features and sleep onset/offset time affect mood at awake
3.1 sleep features affect mood at awake?
sleep_data3_1 <-sleep_data[ , c(1,2,3,4,5,7)]
# relationship between sleep onset and features
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Sleep.quality, fill=Mood.at.awake)) +
# fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Quality") +
theme_ipsum()
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Total.sleep.time, fill=Mood.at.awake)) +
# fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Total sleep time") +
theme_ipsum()
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Heart.rate, fill=Mood.at.awake)) +
# fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Heart rate") +
theme_ipsum()
# Display both charts side by side thanks to the patchwork package
#p1 + p2 +p3
3.1
-Method: The mood at awake in each category was shown with geom_violin() function.
-Result & Summary: No seemingly significant results were shown in the relationship between sleep features and mood at awake.
3.2 Sleep onset/offset affects mood at awake?
###############################Sleep Start######################################
#Plot
sleep_data1_3 %>%
mutate(text = fct_reorder(Mood.at.awake, start_h)) %>%
ggplot( aes(y=Mood.at.awake, x=start_h, fill=text)) +
geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
theme_ridges() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("Sleep Start Time (0~24)") +
ylab("Mood at awake")
###############################Sleep End######################################
#Plot
sleep_data1_3 %>%
mutate(text = fct_reorder(Mood.at.awake, end_h)) %>%
ggplot( aes(y=Mood.at.awake, x=end_h, fill=Mood.at.awake)) +
geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
theme_ridges() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("Sleep End Time(0~24)") +
ylab("Mood at awake")
3.2
-Method:I used ggplot() function to plot the effects of sleep onset/offset time on mood at awake.
-Result & Summary:
1) It is shown that later time to fall asleep are likely to feel bad or good, which indicates that later time to fall asleep affects mood at people in a different way.
2) On the other hand, it is shown that early time to wake up affects most of participants to feel good or bad.
3.3 total sleep time and sleep quality -> mood at awake?
#Mood at awake with time in bed(TST) and sleep quality
ggplot(data = sleep_data, mapping = aes(x = Total.sleep.time, y = Sleep.quality)) +
geom_point(mapping = aes(color = Mood.at.awake)) +
geom_smooth()
-Method: I used ggplot() function with scatter plot.
-Result & Summary: Overall, more people felt good mood at awake, but there is different features between those who felt good and those who felt bad at awake.
Conclusion: Daily features such as coffee and stress are shown to be likely affect humans’ sleep based on the results that all of sleep features that were affected by different daily features. Also, mood at awake was shown to be affected by sleep onset/offset time, indicating that different patters before having sleep lead to different types of sleep by weakening and strengthening sleep quality, time and condition (i.e., heart rate in this project). Different genes and other factors may affect different sleep patterns as well, which will be more interesting to be considered.
What is missing from your final project? More analyses in terms of correlationship between stress and other daily features in sleep offset time would be interesting to be observed. - which was added and updated in HW6 as it has not been posted yet!
What do you hope to accomplish between now and submission time? I want to add one or two more data analysis for better understanding of role of stress in sleep offset time. - which was added and updated in HW6 as it has not been posted yet!
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Noh (2022, May 11). Data Analytics and Computational Social Science: HW 6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh900228/
BibTeX citation
@misc{noh2022hw, author = {Noh, Eunsol}, title = {Data Analytics and Computational Social Science: HW 6}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh900228/}, year = {2022} }