This is HW5 using sleep data to show the effects of daily features on sleep duration, sleep quality and heart rate during sleep. - a few correction updated
<Sleep project-how daily features are related to our sleep?>
The data is from Kaggle (https://www.kaggle.com/danagerous/sleep-data). Sleep was recorded from a Swedish application (iOS) from 180 subjects. 18 people are excluded due to missing information for some categories (i.e., total participants = 162).
variables:
setwd('/Users/eunsolnoh/Desktop/dacss601/R') #set path
sleep_data = select(sleep_data, 1:7) #excluded the variable of "activity steps", which was the last variable (8th) as many subjects didn't include this information
sleep_data<-sleep_data %>%
drop_na(Sleep.quality,Sleep.Notes,Mood.at.awake,Heart.rate) #excluded rows that have blank for some variables
#changing hours to minutes
times<-as.POSIXlt(sleep_data$Time.in.bed, format="%H:%M")
sleep_data$Time.in.bed<-times$hour*60+times$min
#changing the percentage character to numeric
sleep_data$Sleep.quality<-as.numeric(sub("%","",sleep_data$Sleep.quality))
#mutate: Sleep.Notes are divided to columns corresponding to "Drank coffee", "Drank tea", "Worked out" and "Stressful day"in the variable named states for each.
sleep_data<-sleep_data %>%
mutate(coffee_state = case_when(
grepl("Drank coffee",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Drank coffee",Sleep.Notes) == FALSE ~'No' #no
)) %>%
mutate(tea_state = case_when(
grepl("Drank tea",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Drank tea",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(working_out_state = case_when(
grepl("Worked out",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Worked out",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(stress_state = case_when(
grepl("Stressful day",Sleep.Notes) == TRUE ~ 'Yes', #yes
grepl("Stressful day",Sleep.Notes) == FALSE ~ 'No' #no
)) %>%
mutate(TST_mins = as.integer(difftime(End,Start)))
head(arrange(sleep_data, Sleep.quality))
Start End Sleep.quality Time.in.bed
1 2014-12-30 21:17:50 2014-12-30 21:33:54 3 16
2 2015-01-19 05:06:38 2015-01-19 06:20:29 16 73
3 2015-06-05 03:45:52 2015-06-05 05:41:01 23 115
4 2015-05-06 21:47:25 2015-05-07 05:21:38 50 454
5 2015-04-28 21:41:45 2015-04-29 05:00:17 53 438
6 2015-03-04 20:53:47 2015-03-05 06:13:31 54 559
Mood.at.awake Sleep.Notes
1 :| Stressful day
2 :)
3 :)
4 :) Ate late:Drank coffee:Drank tea:Worked out
5 :) Drank coffee:Worked out
6 :) Drank coffee:Drank tea:Stressful day:Worked out
Heart.rate coffee_state tea_state working_out_state stress_state
1 72 No No No Yes
2 58 No No No No
3 57 No No No No
4 59 Yes Yes Yes No
5 59 Yes No Yes No
6 68 Yes Yes Yes Yes
TST_mins
1 16
2 73
3 115
4 454
5 438
6 559
#save(sleep_data, file = "sleep_data.csv")
write.table(sleep_data, file = "sleep_data.csv",
sep = "\t", row.names = F)
1. What daily features (e.g., drinking coffee) affect sleep features (quality, duration and heart rate during sleep) and sleep onset/offset time.
1.1 daily features -> sleep features?
#Bring needed information columns
sleep_data1_1 <- sleep_data[ , c(3,4,7:11)]
###############################sleep equality#################################
# Create new data for sleep quality
set.seed(112)
new_sleep<- matrix(0,4,4)
colnames(new_sleep) <- c("90~100","80~90","70~80","<70")
rownames(new_sleep) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,1]> 90 & sleep_data1_1[x,1] <= 100 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,1]=new_sleep[y1,1]+1;
}
}
else if(sleep_data1_1[x,1]> 80 & sleep_data1_1[x,1] <= 90 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,2]=new_sleep[y1,2]+1;
}
}
else if(sleep_data1_1[x,1]> 70 & sleep_data1_1[x,1] <= 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,3]=new_sleep[y1,3]+1;
}
}
else if(sleep_data1_1[x,1] <= 70 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep[y1,4]=new_sleep[y1,4]+1;
}
}
}
}
###############################Total sleep amount###############################
# Create new data for sleep quality
set.seed(112)
new_sleep1<- matrix(0,4,4)
colnames(new_sleep1) <- c(">8","7~8(typical)","5~7","<=5")
rownames(new_sleep1) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,2]> 480) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,1]=new_sleep1[y1,1]+1;
}
}
else if(sleep_data1_1[x,2]>=420 & sleep_data1_1[x,2] <= 480 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,2]=new_sleep1[y1,2]+1;
}
}
else if(sleep_data1_1[x,2]>= 300 & sleep_data1_1[x,2] < 420 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,3]=new_sleep1[y1,3]+1;
}
}
else if(sleep_data1_1[x,2] < 300 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep1[y1,4]=new_sleep1[y1,4]+1;
}
}
}
}
###############################Heart rate######################################
# Create new data for sleep quality
set.seed(112)
new_sleep2<- matrix(0,4,4)
colnames(new_sleep2) <- c(">80","50~80","40~50(typical)","<40")
rownames(new_sleep2) <- c("drinking coffee","drinking tea","stress","working out")
for (y in 4:7) {
y1<-y-3
for (x in 1:162) {
if(sleep_data1_1[x,3]> 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,1]=new_sleep2[y1,1]+1;
}
}
else if(sleep_data1_1[x,3]> 50 & sleep_data1_1[x,3] <= 80 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,2]=new_sleep2[y1,2]+1;
}
}
else if(sleep_data1_1[x,3]> 45 & sleep_data1_1[x,3] <= 50 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,3]=new_sleep2[y1,3]+1;
}
}
else if(sleep_data1_1[x,3] <=40 ) {
if(sleep_data1_1[x,y]=="Yes") {
new_sleep2[y1,4]=new_sleep2[y1,4]+1;
}
}
}
}
#######################################plot#####################################
# Grouped barplot
barplot(new_sleep,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep),
legend.text = TRUE,
args.legend = list(x = "topright",
inset = c(- 0.001, 0)),
xlab="Sleep quality(%)",
ylab="Numbers",
font.lab=2)
# Grouped barplot
barplot(new_sleep1,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep1),
legend.text = TRUE,
args.legend = list(x = "topright",
inset = c(- 0.001, 0)),
xlab="Total sleep time(Hrs)",
ylab="Numbers",
font.lab=2)
# Grouped barplot
barplot(new_sleep2,
border="white",
font.axis=2,
beside=T,
col = 1:nrow(new_sleep2),
legend.text = TRUE,
args.legend = list(x = "topright",
inset = c(- 0.001, 0)),
xlab="Heart rate(BPM)",
ylab="Numbers",
font.lab=2)
1.1 Overall, the number of people who work out are low compared to other daily features in the three categories. 1) Sleep quality: There seems that less stress leads to better sleep quality. 2) Total time sleep: It looks trivial, but working out slightly plays a role in longer sleeping time. 3) Heart rate: Significantly, having coffee, tea, stress, and working out on the day of sleep shows more increased heart rate during sleep compared to the heart rate range known to be normal during sleep (40~50).
1.2 daily features -> sleep onset/offset time?
sleep_data1_2<-sleep_data[ , c(1,2,8:11)]
#time variable resetting
sleep_data1_2$Start<-strptime(sleep_data1_2$Start, "%H:%M:%S")
sleep_data1_2$Start<-as.numeric(sleep_data1_2$Start)
sleep_data1_2$End<-strptime(sleep_data1_2$End, "%H:%M:%S", tz = "EST5EDT")
sleep_data1_2$End<-as.numeric(sleep_data1_2$End)
# relationship between sleep onset and features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=Start)) +
geom_boxplot(alpha=0.2) +
theme(legend.position="none")+
scale_x_discrete(labels=c("1651650000" = "00:03:30", "1651675000" = "10:57:47",
"1651700000" = "17:57:47", "1651725000" = "23:57:47"))+
xlab("tea_state") +
ylab("sleep onset")
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("tea_state") +
ylab("sleep onset")
p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("workingout_state") +
ylab("sleep onset")
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=Start)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("stress_state") +
ylab("sleep onset")
# Display both charts side by side thanks to the patchwork package
p1 + p2 + p3 +p4
# relationship between sleep onset and features
p1 <- ggplot(sleep_data1_2, aes(x=coffee_state, y=End)) +
geom_boxplot(alpha=0.2) +
xlab("coffee_state") +
ylab("sleep offset")
p2 <- ggplot(sleep_data1_2, aes(x=tea_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("tea_state") +
ylab("sleep offset")
p3 <- ggplot(sleep_data1_2, aes(x=working_out_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("workingout_state") +
ylab("sleep offset")
p4 <- ggplot(sleep_data1_2, aes(x=stress_state, y=End)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("stress_state") +
ylab("sleep offset")
# Display both charts side by side thanks to the patchwork package
p1 + p2 + p3 +p4
# Grouped
ggplot(data = sleep_data, mapping = aes(x = Start, y = End)) +
geom_point() +
geom_smooth()
1.2
1)Sleeping onset and onset time did not show any relationship here.
2)Except for stress that slightly delays the time to wake up, other factors don’t affect sleeping onset/offset time. Considering error bar, the effect of stress on sleep is not significant for sleep offset time.
1.3 sleep features depending on coffee and/or tea?
ggplot(sleep_data, aes(fill=coffee_state, y=Sleep.quality, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Sleep quality depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Sleep quality")
ggplot(sleep_data, aes(fill=coffee_state, y=Total.sleep.time, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Total sleep time depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Total sleep time")
ggplot(sleep_data, aes(fill=coffee_state, y=Heart.rate, x=coffee_state)) +
geom_bar(position="dodge", stat="identity") +
scale_fill_viridis(discrete = T, option = "E") +
ggtitle(" Heart rate depending on coffee and tea intake") +
facet_wrap(~tea_state) +
#facet_wrap(~coffee_state)+
theme_ipsum() +
theme(legend.position="none") +
xlab("Coffee state") +
ylab("Heart rate")
1.3 Since the effects of coffee and tea are similarly observed in each category of each sleep characteristic, further analysis of coffee and tea is made using facet_wrap() function.
Sleep quality: This doesn’t show any meaningful result. Sleep quality seems to be not affected by tea or coffee.
Total sleep time: This shows that participants slept the most when they took no coffee but tea. On the other way, total sleep time was the least when they didn’t take any coffee or tea.
Heart rate: Heart rate during sleep appears to be in far outside the normal range on the day the participants drank coffee regardless of tea intake.
# Check correlations
sleep_data2 <- sleep_data[ , c(3,4,7)]
ggpairs(sleep_data2, title=" 3.Corelationship among sleep features")
#relationship between time in bed(TST) and sleep quality
ggplot(data = sleep_data, mapping = aes(x = Total.sleep.time, y = Sleep.quality)) +
geom_point() +
geom_smooth()
2.1 There is a strong correlation between total sleep time and sleep quality (corr: 0.722) .
3. sleep features and sleep onset/offset time affect mood at awake
3.1 sleep features affect mood at awake?
sleep_data3_1 <-sleep_data[ , c(1,2,3,4,5,7)]
#time variable resetting
sleep_data3_1$Start<-strptime(sleep_data3_1$Start, "%Y-%m-%d %H:%M:%S", tz = "EST5EDT")
sleep_data3_1$Start<-as.numeric(sleep_data3_1$Start)
sleep_data3_1$End<-strptime(sleep_data3_1$End, "%Y-%m-%d %H:%M:%S", tz = "EST5EDT")
sleep_data3_1$End<-as.numeric(sleep_data3_1$End)
# relationship between sleep onset and features
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Sleep.quality, fill=Mood.at.awake)) + # fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Quality") +
theme_ipsum()
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Total.sleep.time, fill=Mood.at.awake)) + # fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Total sleep time") +
theme_ipsum()
ggplot(sleep_data3_1, aes(x=Mood.at.awake, y=Heart.rate, fill=Mood.at.awake)) + # fill=name allow to automatically dedicate a color for each group
geom_violin() +
ggtitle("Heart rate") +
theme_ipsum()
# Display both charts side by side thanks to the patchwork package
#p1 + p2 +p3
3.1 No seemingly significant results were shown in the relationship between sleep feautres and mood at awake.
3.2 Sleep onset/offset affects mood at awake?
sleep_data3_2 <- sleep_data[ , c(1,2,5)]
###############################Sleep Start######################################
#mood at awake depending on sleep onset (sleep starts)
sleep_data3_2$Start<-strptime(sleep_data3_2$Start, "%H:%M:%S", tz = "EST5EDT") #chaging char to time
sleep_data3_2$Start<-as.numeric(sleep_data3_2$Start) #converting time to numeric
#Plot
sleep_data3_2 %>%
mutate(text = fct_reorder(Mood.at.awake, Start)) %>%
ggplot( aes(y=Mood.at.awake, x=Start, fill=Mood.at.awake)) +
geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
theme_ridges() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("Sleep Start Time") +
ylab("Mood at awake")
###############################Sleep End######################################
#mood at awake depending on sleep offset (sleep ends)
sleep_data3_2$End<-strptime(sleep_data3_2$End, format ="%H:%M:%S", tz = "EST5EDT")
sleep_data3_2$End<-as.numeric(sleep_data3_2$End)
#Plot
sleep_data3_2 %>%
mutate(text = fct_reorder(Mood.at.awake, End)) %>%
ggplot( aes(y=Mood.at.awake, x=End, fill=Mood.at.awake)) +
geom_density_ridges(alpha=0.6, stat="binline", bins=20) +
theme_ridges() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
) +
xlab("Sleep End Time") +
ylab("Mood at awake")
3.2 The graphs of the time of waking up and falling asleep show if those times affect mood at awake. 1) It is shown that later time to fall asleep are likely to feel bad or good, which indicates that later time to fall asleep affects mood at people in a different way. 2) On the other hand, it is shown that early time to wake up affects most of participants to feel good or bad.
All in all, this project shows how daily features can affect sleep features. I was able to observe
Total sleep amount is likely to be affected by working-out state on the day to have sleep.
Having coffee, tea and stress is shown to affect increased heart rate during sleep (beyond normal range: 40-50). Working out may affect this as well, but its effect looks trivial.
Total sleep time and sleep quality showed a correlation (r=0.722). Additionally, their relationship showed proportional relationship, indicating that those who sleep early have shorter sleep compared to people who have sleep later.
The most total sleep amount was shown with tea but without coffee. Increased heart rate was observed with coffee intake regardless of having tea.
Mood at awake: bad mood at awake seems to be affected by so early/late time to sleep or wake up.
– What is missing (if anything) in your analysis process so far? X and Y axes of time series will be revised in a better way.
– What conclusions can you make about your research questions at this point? daily features affect sleep features (such as effects of stress or coffee in sleep quality, sleep duration and heart rate during sleep).
– What do you think a naive reader would need to fully understand your graphs? I guess all things were clearly explained above.
– Is there anything you want to answer with your data set, but can’t? All things were well answered for the question, but will show better results with revised time sereies in axes.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Noh (2022, May 11). Data Analytics and Computational Social Science: HW 5 - second trial. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh898808/
BibTeX citation
@misc{noh2022hw, author = {Noh, Eunsol}, title = {Data Analytics and Computational Social Science: HW 5 - second trial}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomenoh898808/}, year = {2022} }