Reading in & tidying data concerning US adults who reported depressive symptoms from 2020-2022
Setting up R so that .xlsx files/data can be read into the program and tidyverse packages are active.
For this assignment I used data from a survey that breaks down a percentage of male & female respondents that reported depressive disorder symptoms within the last 7 days beginning in April of 2020 and ending in March of 2022.
In this example, there are approximately 43 different observations (date ranges) and exactly 2 different variables.
Female - Percentage of respondents that were female.
Male - Percentage of respondents that were male.
US_Adults_Reporting_Depression_2020_2022 <- read_excel("C:/Users/Leshiii/Desktop/DACSS Master's/DACSS 601/HW3/statistic_id1132653_us-adults-who-reported-depressive-symptoms-from-apr-2020-mar-2022-by-gender.xlsx",
sheet = "Data", skip = 4)
Depression <- rename(US_Adults_Reporting_Depression_2020_2022)
#Confirming that the data set was renamed correctly
#Defining all column names
[1] "...1" "Female" "Male" "...4"
Depression <- rename(Depression, Date = ...1, Units = ...4)
#Retaining a copy of the data set that contains all dates for future reference
depression_new <- Depression %>%
select(Date, Female, Male)
#Creating a separate copy of the data set including only numeric values to ease function use
depression_values <- Depression %>%
select(Female, Male)
#Separating values pertaining to each year (i.e. 2020, 2021, 2022)
depression_2020 <- depression_values %>%
filter(row(depression_values) <= 21)
values_2020 <- summarize(depression_2020, Female_2020 = mean(Female), Male_2020 = mean(Male))
values_2020 <-
depression_2021 <- depression_values %>%
filter(row(depression_values) >= 22 & row(depression_values) <=40)
values_2021 <- summarize(depression_2021, Female_2021 = mean(Female), Male_2021 = mean(Male))
values_2021 <-
depression_2022 <- depression_values %>%
filter(row(depression_values) >= 41)
values_2022 <- summarize(depression_2022, Female_2022 = mean(Female), Male_2022 = mean(Male))
values_2022 <-
#Gathering the yearly data into a single data frame and putting the data into tidy form
dep_year <- data.frame(`2020` = values_2020, `2021` = values_2021, `2022` = values_2022)
rownames(dep_year) <- c("Female", "Male")
colnames(dep_year) <- c("2020", "2021", "2022")
dep_year <-
I wanted to obtain a birds eye view of the effects the pandemic had on individuals reporting depressive disorder symptoms, having been a turbulent time for many. Breaking responses up by year helped achieve that, but by the end, I had more questions concerning whether or not introducing different variables would help to further define correlation within the data.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gamez (2022, April 15). Data Analytics and Computational Social Science: HW 3. Retrieved from
BibTeX citation
@misc{gamez2022hw, author = {Gamez, Alexis}, title = {Data Analytics and Computational Social Science: HW 3}, url = {}, year = {2022} }