library(tidyverse)
library(ggplot2)
library(ggforce)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 6
Challenge Overview
Today’s challenge is to:
- read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- create at least one graph including time (evolution)
- try to make them “publication” ready (optional)
- Explain why you choose the specific graph type
- Create at least one graph depicting part-whole or flow relationships
- try to make them “publication” ready (optional)
- Explain why you choose the specific graph type
R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code.
(be sure to only include the category tags for the data you use!)
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- debt ⭐
- fed_rate ⭐⭐
- abc_poll ⭐⭐⭐
- usa_hh ⭐⭐⭐
- hotel_bookings ⭐⭐⭐⭐
- AB_NYC ⭐⭐⭐⭐⭐
<- read_excel("_data/debt_in_trillions.xlsx")
RawData head(RawData)
# A tibble: 6 × 8
`Year and Quarter` Mortgage `HE Revolving` Auto …¹ Credi…² Stude…³ Other Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03:Q1 4.94 0.242 0.641 0.688 0.241 0.478 7.23
2 03:Q2 5.08 0.26 0.622 0.693 0.243 0.486 7.38
3 03:Q3 5.18 0.269 0.684 0.693 0.249 0.477 7.56
4 03:Q4 5.66 0.302 0.704 0.698 0.253 0.449 8.07
5 04:Q1 5.84 0.328 0.72 0.695 0.260 0.446 8.29
6 04:Q2 5.97 0.367 0.743 0.697 0.263 0.423 8.46
# … with abbreviated variable names ¹`Auto Loan`, ²`Credit Card`,
# ³`Student Loan`
Briefly describe the data
The information appears to be the total amount of debt that some countries’ residents, most likely those of the US have.
<- RawData %>%
splitDataseparate(`Year and Quarter`,c('Year','Quarter'),sep = ":")
splitData
# A tibble: 74 × 9
Year Quarter Mortgage `HE Revolving` `Auto Loan` Credi…¹ Stude…² Other Total
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03 Q1 4.94 0.242 0.641 0.688 0.241 0.478 7.23
2 03 Q2 5.08 0.26 0.622 0.693 0.243 0.486 7.38
3 03 Q3 5.18 0.269 0.684 0.693 0.249 0.477 7.56
4 03 Q4 5.66 0.302 0.704 0.698 0.253 0.449 8.07
5 04 Q1 5.84 0.328 0.72 0.695 0.260 0.446 8.29
6 04 Q2 5.97 0.367 0.743 0.697 0.263 0.423 8.46
7 04 Q3 6.21 0.426 0.751 0.706 0.33 0.41 8.83
8 04 Q4 6.36 0.468 0.728 0.717 0.346 0.423 9.04
9 05 Q1 6.51 0.502 0.725 0.71 0.364 0.394 9.21
10 05 Q2 6.70 0.528 0.774 0.717 0.374 0.402 9.49
# … with 64 more rows, and abbreviated variable names ¹`Credit Card`,
# ²`Student Loan`
Time Dependent Visualization
Below is a time-dependent graphic of credit card debt; I later alter the data to compare it to other types of debt.
<- splitData %>%
scatter ggplot(mapping=aes(x = Year, y = `Credit Card`))+
geom_point(aes(color=Quarter))
scatter
pivoting data again
<- splitData%>%
longerSplitDatapivot_longer(!c(Year,Quarter), names_to = "DebtType",values_to = "DebtPercent" )
longerSplitData
# A tibble: 518 × 4
Year Quarter DebtType DebtPercent
<chr> <chr> <chr> <dbl>
1 03 Q1 Mortgage 4.94
2 03 Q1 HE Revolving 0.242
3 03 Q1 Auto Loan 0.641
4 03 Q1 Credit Card 0.688
5 03 Q1 Student Loan 0.241
6 03 Q1 Other 0.478
7 03 Q1 Total 7.23
8 03 Q2 Mortgage 5.08
9 03 Q2 HE Revolving 0.26
10 03 Q2 Auto Loan 0.622
# … with 508 more rows
Visualizing Part-Whole Relationships
<- longerSplitData%>%
longerSplitDataPlot ggplot(mapping=aes(x = Year, y = DebtPercent))
+
longerSplitDataPlot facet_wrap(~DebtType, scales = "free")
#visualize the data by debt type
+
longerSplitDataPlot geom_point(aes(color = DebtType))
I wanted to analyze how different sorts of debt affected the total debt for that year, so I separated out the types of debt. As you can see, the mortgage debt seems to have the biggest impact on the total.
+
longerSplitDataPlotgeom_point() +
facet_wrap(~DebtType) +
scale_x_discrete(breaks = c('03','06','09',12,15,18,21))
The information above shows how mortgages contributed to the total debt, but what about the trends of the other types of debt? Are they similar in shape? To view this, I had to turn the scales to the free position.
+
longerSplitDataPlotgeom_point(aes(color = Quarter,alpha=0.9,)) +
facet_wrap(~DebtType, scales = "free_y") +
guides(alpha="none") +
labs(title="Debt by type from '03 - '21")+
scale_x_discrete(breaks = c('03','06','09',12,15,18,21))