HW 3

Reading in & tidying data concerning US adults who reported depressive symptoms from 2020-2022

Alexis Gamez
4/9/2022

Setup

Setting up R so that .xlsx files/data can be read into the program and tidyverse packages are active.

Data Set

For this assignment I used data from a survey that breaks down a percentage of male & female respondents that reported depressive disorder symptoms within the last 7 days beginning in April of 2020 and ending in March of 2022.

Data Set Variables

In this example, there are approximately 43 different observations (date ranges) and exactly 2 different variables.

Female - Percentage of respondents that were female.

Male - Percentage of respondents that were male.

Reading in the Data

US_Adults_Reporting_Depression_2020_2022 <- read_excel("C:/Users/Leshiii/Desktop/DACSS Master's/DACSS 601/HW3/statistic_id1132653_us-adults-who-reported-depressive-symptoms-from-apr-2020-mar-2022-by-gender.xlsx", 
     sheet = "Data", skip = 4)

Depression <- rename(US_Adults_Reporting_Depression_2020_2022)

#Confirming that the data set was renamed correctly
View(Depression)

Tidying the Data

#Defining all column names
colnames(Depression)
[1] "...1"   "Female" "Male"   "...4"  
Depression <- rename(Depression, Date = ...1, Units = ...4)

#Retaining a copy of the data set that contains all dates for future reference
depression_new <- Depression %>%
    select(Date, Female, Male)

#Creating a separate copy of the data set including only numeric values to ease function use
depression_values <- Depression %>%
    select(Female, Male)

#Separating values pertaining to each year (i.e. 2020, 2021, 2022)
depression_2020 <- depression_values %>%
    filter(row(depression_values) <= 21)
values_2020 <- summarize(depression_2020, Female_2020 = mean(Female), Male_2020 = mean(Male))
values_2020 <- as.data.frame(t(values_2020))

depression_2021 <- depression_values %>%
    filter(row(depression_values) >= 22 & row(depression_values) <=40)
values_2021 <- summarize(depression_2021, Female_2021 = mean(Female), Male_2021 = mean(Male))
values_2021 <- as.data.frame(t(values_2021))

depression_2022 <- depression_values %>%
    filter(row(depression_values) >= 41)
values_2022 <- summarize(depression_2022, Female_2022 = mean(Female), Male_2022 = mean(Male))
values_2022 <- as.data.frame(t(values_2022))

#Gathering the yearly data into a single data frame and putting the data into tidy form
dep_year <- data.frame(`2020` = values_2020, `2021` = values_2021, `2022` = values_2022)
rownames(dep_year) <- c("Female", "Male")
colnames(dep_year) <- c("2020", "2021", "2022")
dep_year <- as.data.frame(t(dep_year))

Potential Research Questions

I wanted to obtain a birds eye view of the effects the pandemic had on individuals reporting depressive disorder symptoms, having been a turbulent time for many. Breaking responses up by year helped achieve that, but by the end, I had more questions concerning whether or not introducing different variables would help to further define correlation within the data.

  1. Is the variation between respondent percentages, by year, significant enough to determine whether peak pandemic years have any correlation with the quantity of symptoms reported?
  2. Is there any variance concerning the date ranges and seasons? (i.e. spring, summer, fall, winter)
  3. Would the introduction of different variables, like political affiliations, racial & age demographics help further define any correlation within the data?

Source

Source: [https://www.statista.com/statistics/1132653/depressive-symptoms-us-adults-by-gender-past-week/#professional]

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gamez (2022, April 15). Data Analytics and Computational Social Science: HW 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654888616/

BibTeX citation

@misc{gamez2022hw,
  author = {Gamez, Alexis},
  title = {Data Analytics and Computational Social Science: HW 3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654888616/},
  year = {2022}
}