The dataset I am currently working with spans from the first quarter of 2003 to the second quarter of 2021. It provides quarterly measurements of household debt associated with various loan types. Specifically, it includes information on mortgage debt, HE revolving debt, auto loans, credit card debt, student loans, and other types of loans. Additionally, there is a variable representing the total household debt, which encompasses all six loan types.
Read in Data
newdebt <-read_excel("_data/debt_in_trillions.xlsx")##condensed summary of the dataprint(summarytools::dfSummary(newdebt,varnumbers =FALSE,plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE),method ='render',table.classes ='table-condensed')
Data Frame Summary
newdebt
Dimensions: 74 x 8
Duplicates: 0
Variable
Stats / Values
Freqs (% of Valid)
Graph
Missing
Year and Quarter [character]
1. 03:Q1
2. 03:Q2
3. 03:Q3
4. 03:Q4
5. 04:Q1
6. 04:Q2
7. 04:Q3
8. 04:Q4
9. 05:Q1
10. 05:Q2
[ 64 others ]
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
1
(
1.4%
)
64
(
86.5%
)
0 (0.0%)
Mortgage [numeric]
Mean (sd) : 8.3 (1.2)
min ≤ med ≤ max:
4.9 ≤ 8.4 ≤ 10.4
IQR (CV) : 1 (0.1)
74 distinct values
0 (0.0%)
HE Revolving [numeric]
Mean (sd) : 0.5 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.5 ≤ 0.7
IQR (CV) : 0.2 (0.2)
73 distinct values
0 (0.0%)
Auto Loan [numeric]
Mean (sd) : 0.9 (0.2)
min ≤ med ≤ max:
0.6 ≤ 0.8 ≤ 1.4
IQR (CV) : 0.4 (0.3)
71 distinct values
0 (0.0%)
Credit Card [numeric]
Mean (sd) : 0.8 (0.1)
min ≤ med ≤ max:
0.7 ≤ 0.7 ≤ 0.9
IQR (CV) : 0.1 (0.1)
69 distinct values
0 (0.0%)
Student Loan [numeric]
Mean (sd) : 0.9 (0.4)
min ≤ med ≤ max:
0.2 ≤ 0.9 ≤ 1.6
IQR (CV) : 0.8 (0.5)
73 distinct values
0 (0.0%)
Other [numeric]
Mean (sd) : 0.4 (0)
min ≤ med ≤ max:
0.3 ≤ 0.4 ≤ 0.5
IQR (CV) : 0.1 (0.1)
70 distinct values
0 (0.0%)
Total [numeric]
Mean (sd) : 11.8 (1.7)
min ≤ med ≤ max:
7.2 ≤ 11.9 ≤ 15
IQR (CV) : 1.4 (0.1)
74 distinct values
0 (0.0%)
Generated by summarytools 1.0.1 (R version 4.2.2) 2023-05-11
We observed that the date and quarter information is combined in a single column. To address this, we will utilize the mutate function to split them into separate columns. Additionally, we will leverage the lubridate package to convert the year into a date format, which will enhance our analysis.
#separate date and quarternewdebt <- newdebt %>%separate(`Year and Quarter`, into =c("Year", "Quarter"), sep =":Q") %>%mutate(Year =paste0("20", Year))
Bivariate Analysis
#pivot longer to clarify debt typelongerdata <- newdebt %>%pivot_longer(cols =-c(Year, Total, Quarter), names_to ="Loan Type", values_to ="Debt")##next, try to stack the dataggplot(longerdata, aes(x = Year, y =`Total`, fill =`Loan Type`)) +geom_bar(stat ="identity") +labs(x ="Year", y =" Total Debt Amount", fill ="Debt Type") +ggtitle("Stacked Bar Graph: Debt Types by Year") +scale_x_discrete(breaks =unique(longerdata$Year))
Here, I attempted to use the pivot_longer function to reshape the data before creating a stacked bar graph. The purpose of using a stacked bar graph in this instance is to visually depict the composition of different debt types by their respective debt amounts. I also don’t know what went wrong in the year column.
Time
Now, we utilize a scatter plot to examine the total debt over time. A scatter plot is an effective choice in this scenario as it allows us to visualize the relationship between two continuous variables: time (year) and total debt.
ggplot(data = longerdata, aes(x =as.numeric(Year), y = Total)) +geom_point(size =3) +labs(x ="Year", y ="Total Debt") +ggtitle("Total Debt Over Time") +scale_x_continuous(breaks =seq(2000, 2025, by =5)) +theme_minimal()
Source Code
---title: "Challenge 6: Visualizing Debt Over Time"author: "Dirichi Umunna"description: "Visualizing Time and Relationships"date: "05/08/2022"format: html: toc: true code-copy: true code-tools: truecategories: - challenge_6 - Dirichi Umunna - debt---```{r}#| label: setup#| warning: false#| message: falselibrary(tidyverse)library(ggplot2)library(lubridate)library(dplyr)library(summarytools)library(readxl)knitr::opts_chunk$set(echo =TRUE, warning=FALSE, message=FALSE)```## IntroductionThe dataset I am currently working with spans from the first quarter of 2003 to the second quarter of 2021. It provides quarterly measurements of household debt associated with various loan types. Specifically, it includes information on mortgage debt, HE revolving debt, auto loans, credit card debt, student loans, and other types of loans. Additionally, there is a variable representing the total household debt, which encompasses all six loan types.## Read in Data```{r}newdebt <-read_excel("_data/debt_in_trillions.xlsx")##condensed summary of the dataprint(summarytools::dfSummary(newdebt,varnumbers =FALSE,plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE),method ='render',table.classes ='table-condensed')```We observed that the date and quarter information is combined in a single column. To address this, we will utilize the mutate function to split them into separate columns. Additionally, we will leverage the lubridate package to convert the year into a date format, which will enhance our analysis.```{r}#separate date and quarternewdebt <- newdebt %>%separate(`Year and Quarter`, into =c("Year", "Quarter"), sep =":Q") %>%mutate(Year =paste0("20", Year))```## Bivariate Analysis```{r}#pivot longer to clarify debt typelongerdata <- newdebt %>%pivot_longer(cols =-c(Year, Total, Quarter), names_to ="Loan Type", values_to ="Debt")##next, try to stack the dataggplot(longerdata, aes(x = Year, y =`Total`, fill =`Loan Type`)) +geom_bar(stat ="identity") +labs(x ="Year", y =" Total Debt Amount", fill ="Debt Type") +ggtitle("Stacked Bar Graph: Debt Types by Year") +scale_x_discrete(breaks =unique(longerdata$Year))```Here, I attempted to use the pivot_longer function to reshape the data before creating a stacked bar graph. The purpose of using a stacked bar graph in this instance is to visually depict the composition of different debt types by their respective debt amounts. I also don't know what went wrong in the year column.## TimeNow, we utilize a scatter plot to examine the total debt over time. A scatter plot is an effective choice in this scenario as it allows us to visualize the relationship between two continuous variables: time (year) and total debt.```{r}ggplot(data = longerdata, aes(x =as.numeric(Year), y = Total)) +geom_point(size =3) +labs(x ="Year", y ="Total Debt") +ggtitle("Total Debt Over Time") +scale_x_continuous(breaks =seq(2000, 2025, by =5)) +theme_minimal()```