Challenge 4 Solution - Debt Dataset

challenge_4
debt
Linus Jen
More data wrangling: pivoting
Author

Linus Jen

Published

June 9, 2023

Code
library(tidyverse)
library(here)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

For this challenge, I will be working with the debt_in_trillions.xlsx dataset.

Code
data <- readxl::read_excel(here("posts", "_data", "debt_in_trillions.xlsx"))

# Display data
print(paste0("There are ", dim(data)[1], " rows and ", dim(data)[2], " columns in this dataset."))
[1] "There are 74 rows and 8 columns in this dataset."
Code
head(data)
Code
glimpse(data)
Rows: 74
Columns: 8
$ `Year and Quarter` <chr> "03:Q1", "03:Q2", "03:Q3", "03:Q4", "04:Q1", "04:Q2…
$ Mortgage           <dbl> 4.942, 5.080, 5.183, 5.660, 5.840, 5.967, 6.210, 6.…
$ `HE Revolving`     <dbl> 0.242, 0.260, 0.269, 0.302, 0.328, 0.367, 0.426, 0.…
$ `Auto Loan`        <dbl> 0.641, 0.622, 0.684, 0.704, 0.720, 0.743, 0.751, 0.…
$ `Credit Card`      <dbl> 0.688, 0.693, 0.693, 0.698, 0.695, 0.697, 0.706, 0.…
$ `Student Loan`     <dbl> 0.2407000, 0.2429000, 0.2488000, 0.2529000, 0.25980…
$ Other              <dbl> 0.4776, 0.4860, 0.4773, 0.4486, 0.4465, 0.4231, 0.4…
$ Total              <dbl> 7.2313, 7.3839, 7.5551, 8.0655, 8.2893, 8.4600, 8.8…
Code
# Check NAs
data %>%
  summarise(across(everything(), ~sum(is.na(.))))

Briefly describe the data

Based on the title of the .csv file and what is shown within the document (I opened the Excel file to see what the data looked like before importing it), this file shows debt for various sectors, in trillions of dollars. I found information about this dataset from this site. Per this report’s data dictionary, data includes the year and quarter, the mortgage debt, home equity revolving debt, auto loan debt, credit card debt, student loan debt, other debt, and the total debt, all in trillions of dollars. This was collected by the FRBNY Consumer Credit Panel, and comes from a 5% random sample of all individuals with a social security number and credit report.

From the Year and Quarter column, we see that we get a 2 digit year and a quarter, separated by a :. All other columns are floats, and should be dollar amounts in the trillions.

Tidy Data (as needed)

This data is clearly not tidy. All the numerical columns contain one type (debt in ($)), while the column headers represent a category of debt. Thus, we need to pivot_longer() this data so that we get a column for the types of debt. Additionally, need to convert the Year and Quarter column to be a lubridate quarter value

Code
data <- data %>%
  # First year and quarter first
  mutate(`year_quarter` = quarter(yq(`Year and Quarter`), with_year=TRUE)) %>%
  # make tidy by pivot_longer()
  pivot_longer(!c(`Year and Quarter`, year_quarter), names_to="debt_type", values_to="debt_amount") %>%
  mutate(debt_type = factor(debt_type, levels=unique(debt_type))) %>%
  select(-`Year and Quarter`)

Results

Code
# View data
head(data)
Code
glimpse(data)
Rows: 518
Columns: 3
$ year_quarter <dbl> 2003.1, 2003.1, 2003.1, 2003.1, 2003.1, 2003.1, 2003.1, 2…
$ debt_type    <fct> Mortgage, HE Revolving, Auto Loan, Credit Card, Student L…
$ debt_amount  <dbl> 4.9420, 0.2420, 0.6410, 0.6880, 0.2407, 0.4776, 7.2313, 5…
Code
head(data$debt_type)
[1] Mortgage     HE Revolving Auto Loan    Credit Card  Student Loan
[6] Other       
7 Levels: Mortgage HE Revolving Auto Loan Credit Card Student Loan ... Total

From the output above, we now see our data in a tidy format. Firstly, our year_quarter column is the correct date type, using lubridate to convert the YY:Q# format to YYYY.Q format. The debt_type column contains the category of household debt - note that we kept the total column in case someone wanted to compare totals (though this category can be seen as redundant). We also converted the column to be a set of factors, and ordered it based on the order of the columns presented in the dataset (and specifically keeping the “Other” and “Total” categories together). Lastly, we have the debt_amount column that contains the debt (in trillions of US dollars).