Challenge 6: Visualizing Debt Over Time

challenge_6
Dirichi Umunna
debt
Visualizing Time and Relationships
Author

Dirichi Umunna

Published

May 8, 2022

library(tidyverse)
library(ggplot2)
library(lubridate)
library(dplyr)
library(summarytools)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Introduction

The dataset I am currently working with spans from the first quarter of 2003 to the second quarter of 2021. It provides quarterly measurements of household debt associated with various loan types. Specifically, it includes information on mortgage debt, HE revolving debt, auto loans, credit card debt, student loans, and other types of loans. Additionally, there is a variable representing the total household debt, which encompasses all six loan types.

Read in Data

newdebt <- read_excel("_data/debt_in_trillions.xlsx")

##condensed summary of the data

print(summarytools::dfSummary(newdebt,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

newdebt

Dimensions: 74 x 8
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
Year and Quarter [character]
1. 03:Q1
2. 03:Q2
3. 03:Q3
4. 03:Q4
5. 04:Q1
6. 04:Q2
7. 04:Q3
8. 04:Q4
9. 05:Q1
10. 05:Q2
[ 64 others ]
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
1 ( 1.4% )
64 ( 86.5% )
0 (0.0%)
Mortgage [numeric]
Mean (sd) : 8.3 (1.2)
min ≤ med ≤ max:
4.9 ≤ 8.4 ≤ 10.4
IQR (CV) : 1 (0.1)
74 distinct values 0 (0.0%)
HE Revolving [numeric]
Mean (sd) : 0.5 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.5 ≤ 0.7
IQR (CV) : 0.2 (0.2)
73 distinct values 0 (0.0%)
Auto Loan [numeric]
Mean (sd) : 0.9 (0.2)
min ≤ med ≤ max:
0.6 ≤ 0.8 ≤ 1.4
IQR (CV) : 0.4 (0.3)
71 distinct values 0 (0.0%)
Credit Card [numeric]
Mean (sd) : 0.8 (0.1)
min ≤ med ≤ max:
0.7 ≤ 0.7 ≤ 0.9
IQR (CV) : 0.1 (0.1)
69 distinct values 0 (0.0%)
Student Loan [numeric]
Mean (sd) : 0.9 (0.4)
min ≤ med ≤ max:
0.2 ≤ 0.9 ≤ 1.6
IQR (CV) : 0.8 (0.5)
73 distinct values 0 (0.0%)
Other [numeric]
Mean (sd) : 0.4 (0)
min ≤ med ≤ max:
0.3 ≤ 0.4 ≤ 0.5
IQR (CV) : 0.1 (0.1)
70 distinct values 0 (0.0%)
Total [numeric]
Mean (sd) : 11.8 (1.7)
min ≤ med ≤ max:
7.2 ≤ 11.9 ≤ 15
IQR (CV) : 1.4 (0.1)
74 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-05-11

We observed that the date and quarter information is combined in a single column. To address this, we will utilize the mutate function to split them into separate columns. Additionally, we will leverage the lubridate package to convert the year into a date format, which will enhance our analysis.

#separate date and quarter
newdebt <- newdebt %>%
  separate(`Year and Quarter`, into = c("Year", "Quarter"), sep = ":Q") %>%
  mutate(Year = paste0("20", Year))

Bivariate Analysis

#pivot longer to clarify debt type

longerdata <- newdebt %>%
  pivot_longer(cols = -c(Year, Total, Quarter), names_to = "Loan Type", values_to = "Debt")
 
##next, try to stack the data

ggplot(longerdata, aes(x = Year, y = `Total`, fill = `Loan Type`)) +
  geom_bar(stat = "identity") +
  labs(x = "Year", y = " Total Debt Amount", fill = "Debt Type") +
  ggtitle("Stacked Bar Graph: Debt Types by Year") +
  scale_x_discrete(breaks = unique(longerdata$Year))

Here, I attempted to use the pivot_longer function to reshape the data before creating a stacked bar graph. The purpose of using a stacked bar graph in this instance is to visually depict the composition of different debt types by their respective debt amounts. I also don’t know what went wrong in the year column.

Time

Now, we utilize a scatter plot to examine the total debt over time. A scatter plot is an effective choice in this scenario as it allows us to visualize the relationship between two continuous variables: time (year) and total debt.

ggplot(data = longerdata, aes(x = as.numeric(Year), y = Total)) +
  geom_point(size = 3) +
  labs(x = "Year", y = "Total Debt") +
  ggtitle("Total Debt Over Time") +
  scale_x_continuous(breaks = seq(2000, 2025, by = 5)) +
  theme_minimal()