Code
library(tidyverse)
library(stringr)
library(readxl)
library(lubridate)
library(skimr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Animesh Sengupta
August 18, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 10 × 8
Year.and.Quarter Mortgage HE.Revolving Auto.Loan Credit…¹ Stude…² Other Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03:Q1 4.94 0.242 0.641 0.688 0.241 0.478 7.23
2 03:Q2 5.08 0.26 0.622 0.693 0.243 0.486 7.38
3 03:Q3 5.18 0.269 0.684 0.693 0.249 0.477 7.56
4 03:Q4 5.66 0.302 0.704 0.698 0.253 0.449 8.07
5 04:Q1 5.84 0.328 0.72 0.695 0.260 0.446 8.29
6 04:Q2 5.97 0.367 0.743 0.697 0.263 0.423 8.46
7 04:Q3 6.21 0.426 0.751 0.706 0.33 0.41 8.83
8 04:Q4 6.36 0.468 0.728 0.717 0.346 0.423 9.04
9 05:Q1 6.51 0.502 0.725 0.71 0.364 0.394 9.21
10 05:Q2 6.70 0.528 0.774 0.717 0.374 0.402 9.49
# … with abbreviated variable names ¹Credit.Card, ²Student.Loan
The data represents the debt statistics quarterly across different asset class.
The data is nearly tidy, we just need to mutate the Year and quarter column and make sure all the numerical values are uniform across the dataset.
[1] "Year.and.Quarter" "Mortgage" "HE.Revolving" "Auto.Loan"
[5] "Credit.Card" "Student.Loan" "Other" "Total"
# A tibble: 20 × 9
Year.and.Quarter Mortgage HE.Revolving Auto.Loan Credit…¹ Stude…² Other Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03:Q1 4.94 0.242 0.641 0.688 0.241 0.478 7.23
2 03:Q2 5.08 0.26 0.622 0.693 0.243 0.486 7.38
3 03:Q3 5.18 0.269 0.684 0.693 0.249 0.477 7.56
4 03:Q4 5.66 0.302 0.704 0.698 0.253 0.449 8.07
5 04:Q1 5.84 0.328 0.72 0.695 0.260 0.446 8.29
6 04:Q2 5.97 0.367 0.743 0.697 0.263 0.423 8.46
7 04:Q3 6.21 0.426 0.751 0.706 0.33 0.41 8.83
8 04:Q4 6.36 0.468 0.728 0.717 0.346 0.423 9.04
9 05:Q1 6.51 0.502 0.725 0.71 0.364 0.394 9.21
10 05:Q2 6.70 0.528 0.774 0.717 0.374 0.402 9.49
11 05:Q3 6.91 0.541 0.83 0.732 0.378 0.405 9.79
12 05:Q4 7.10 0.565 0.792 0.736 0.392 0.416 10.0
13 06:Q1 7.44 0.582 0.788 0.723 0.434 0.418 10.4
14 06:Q2 7.76 0.59 0.796 0.739 0.439 0.423 10.7
15 06:Q3 8.04 0.603 0.821 0.754 0.447 0.442 11.1
16 06:Q4 8.23 0.604 0.821 0.767 0.482 0.406 11.3
17 07:Q1 8.42 0.605 0.794 0.764 0.506 0.404 11.5
18 07:Q2 8.71 0.619 0.807 0.796 0.514 0.408 11.8
19 07:Q3 8.93 0.631 0.818 0.817 0.528 0.413 12.1
20 07:Q4 9.10 0.647 0.815 0.839 0.548 0.422 12.4
# … with 1 more variable: Date <dttm>, and abbreviated variable names
# ¹Credit.Card, ²Student.Loan
# ℹ Use `colnames()` to see all variable names
Any additional comments? So here we converted the 03:Q1 format of date to human readable date using the parse_date_time. Also , we rounded of all the numeric data to 4 decimal places for uniformity.
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
# A tibble: 10 × 11
# Rowwise:
Year.and.Quarter Mortgage HE.Revolving Auto.Loan Credit…¹ Stude…² Other Total
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03:Q1 4.94 0.242 0.641 0.688 0.241 0.478 7.23
2 03:Q2 5.08 0.26 0.622 0.693 0.243 0.486 7.38
3 03:Q3 5.18 0.269 0.684 0.693 0.249 0.477 7.56
4 03:Q4 5.66 0.302 0.704 0.698 0.253 0.449 8.07
5 04:Q1 5.84 0.328 0.72 0.695 0.260 0.446 8.29
6 04:Q2 5.97 0.367 0.743 0.697 0.263 0.423 8.46
7 04:Q3 6.21 0.426 0.751 0.706 0.33 0.41 8.83
8 04:Q4 6.36 0.468 0.728 0.717 0.346 0.423 9.04
9 05:Q1 6.51 0.502 0.725 0.71 0.364 0.394 9.21
10 05:Q2 6.70 0.528 0.774 0.717 0.374 0.402 9.49
# … with 3 more variables: Date <dttm>, Expected_sum <dbl>, Difference <dbl>,
# and abbreviated variable names ¹Credit.Card, ²Student.Loan
# ℹ Use `colnames()` to see all variable names
Any additional comments? As part of Sanity checks, calculation of total debt across column needs to be verified. Across() function was used rowwise to calculate the total debt and compared the absolute value between each other. As per the data, the difference is near to zero hence the total computations are veritable.
---
title: "Challenge 4"
author: "Animesh Sengupta"
desription: "More data wrangling: pivoting"
date: "08/18/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- Animesh Sengupta
- Debt in trillions data
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(stringr)
library(readxl)
library(lubridate)
library(skimr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.csv⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions ⭐⭐⭐⭐⭐
```{r}
library(readxl)
debt_data <- read_excel("../posts/_data/debt_in_trillions.xlsx", .name_repair = "universal")
head(debt_data,10)
```
### Briefly describe the data
The data represents the debt statistics quarterly across different asset class.
## Tidy Data (as needed)
The data is nearly tidy, we just need to mutate the Year and quarter column and make sure all the numerical values are uniform across the dataset.
```{r}
colnames(debt_data)
debt_data1 <- debt_data%>%
mutate(
Date= parse_date_time(Year.and.Quarter,"yq"),
across(where(is.numeric), round, 4)
)
head(debt_data1,20)
```
Any additional comments?
So here we converted the 03:Q1 format of date to human readable date using the parse_date_time. Also , we rounded of all the numeric data to 4 decimal places for uniformity.
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
```{r}
debt_checks<-debt_data1%>%rowwise()%>%
mutate(Expected_sum=sum(across(.cols=c(Mortgage,HE.Revolving,Auto.Loan,Credit.Card,Student.Loan,Other))),
Difference=abs(Total-Expected_sum))
head(debt_checks,10)
```
Any additional comments?
As part of Sanity checks, calculation of total debt across column needs to be verified. Across() function was used rowwise to calculate the total debt and compared the absolute value between each other. As per the data, the difference is near to zero hence the total computations are veritable.