Challenge 4: Data Wrangling-Mutate

challenge_4
tidydata
mutate
lubridate
Author

Surya Praneeth Reddy Chirasani

Published

January 8, 2023

Code
library(tidyverse)
library("readxl")
library(lubridate)
knitr::opts_chunk$set(echo = TRUE)
Code
debt_data <-read_excel("_data/debt_in_trillions.xlsx")
debt_data
# A tibble: 74 × 8
   `Year and Quarter` Mortgage HE Revolvin…¹ Auto …² Credi…³ Stude…⁴ Other Total
   <chr>                 <dbl>         <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl>
 1 03:Q1                  4.94         0.242   0.641   0.688   0.241 0.478  7.23
 2 03:Q2                  5.08         0.26    0.622   0.693   0.243 0.486  7.38
 3 03:Q3                  5.18         0.269   0.684   0.693   0.249 0.477  7.56
 4 03:Q4                  5.66         0.302   0.704   0.698   0.253 0.449  8.07
 5 04:Q1                  5.84         0.328   0.72    0.695   0.260 0.446  8.29
 6 04:Q2                  5.97         0.367   0.743   0.697   0.263 0.423  8.46
 7 04:Q3                  6.21         0.426   0.751   0.706   0.33  0.41   8.83
 8 04:Q4                  6.36         0.468   0.728   0.717   0.346 0.423  9.04
 9 05:Q1                  6.51         0.502   0.725   0.71    0.364 0.394  9.21
10 05:Q2                  6.70         0.528   0.774   0.717   0.374 0.402  9.49
# … with 64 more rows, and abbreviated variable names ¹​`HE Revolving`,
#   ²​`Auto Loan`, ³​`Credit Card`, ⁴​`Student Loan`
Code
tail(debt_data)
# A tibble: 6 × 8
  `Year and Quarter` Mortgage `HE Revolving` Auto …¹ Credi…² Stude…³ Other Total
  <chr>                 <dbl>          <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl>
1 20:Q1                  9.71          0.386    1.35   0.893    1.54 0.427  14.3
2 20:Q2                  9.78          0.375    1.34   0.817    1.54 0.418  14.3
3 20:Q3                  9.86          0.362    1.36   0.807    1.55 0.417  14.4
4 20:Q4                 10.0           0.349    1.37   0.819    1.56 0.419  14.6
5 21:Q1                 10.2           0.335    1.38   0.77     1.58 0.413  14.6
6 21:Q2                 10.4           0.322    1.42   0.787    1.57 0.421  15.0
# … with abbreviated variable names ¹​`Auto Loan`, ²​`Credit Card`,
#   ³​`Student Loan`

This dataset has quarterly debt data of different categories in a household such as Mortgage, HE Revolving, Auto Loan, Credit Card, Student Loan and other kinds of debt from the first quarter of Year 2003 to second Quarter of Year 2021. It also has total debt data amount which is the sum of the above mentioned six different debt data

For particular data analysis, we can consider to pivot the data using pivot_longer so we can group the data using the Year and Quarter column. Also in order to that, the Year and Quarter column is not in a typical date format. This column can be converted to the typical data format using parse_data_time function from the lubridate library by passing the date order ‘YQ’ through the orders argument

Code
debt <- debt_data%>%
        mutate(date = parse_date_time(`Year and Quarter`, orders="yq"))
debt$date
 [1] "2003-01-01 UTC" "2003-04-01 UTC" "2003-07-01 UTC" "2003-10-01 UTC"
 [5] "2004-01-01 UTC" "2004-04-01 UTC" "2004-07-01 UTC" "2004-10-01 UTC"
 [9] "2005-01-01 UTC" "2005-04-01 UTC" "2005-07-01 UTC" "2005-10-01 UTC"
[13] "2006-01-01 UTC" "2006-04-01 UTC" "2006-07-01 UTC" "2006-10-01 UTC"
[17] "2007-01-01 UTC" "2007-04-01 UTC" "2007-07-01 UTC" "2007-10-01 UTC"
[21] "2008-01-01 UTC" "2008-04-01 UTC" "2008-07-01 UTC" "2008-10-01 UTC"
[25] "2009-01-01 UTC" "2009-04-01 UTC" "2009-07-01 UTC" "2009-10-01 UTC"
[29] "2010-01-01 UTC" "2010-04-01 UTC" "2010-07-01 UTC" "2010-10-01 UTC"
[33] "2011-01-01 UTC" "2011-04-01 UTC" "2011-07-01 UTC" "2011-10-01 UTC"
[37] "2012-01-01 UTC" "2012-04-01 UTC" "2012-07-01 UTC" "2012-10-01 UTC"
[41] "2013-01-01 UTC" "2013-04-01 UTC" "2013-07-01 UTC" "2013-10-01 UTC"
[45] "2014-01-01 UTC" "2014-04-01 UTC" "2014-07-01 UTC" "2014-10-01 UTC"
[49] "2015-01-01 UTC" "2015-04-01 UTC" "2015-07-01 UTC" "2015-10-01 UTC"
[53] "2016-01-01 UTC" "2016-04-01 UTC" "2016-07-01 UTC" "2016-10-01 UTC"
[57] "2017-01-01 UTC" "2017-04-01 UTC" "2017-07-01 UTC" "2017-10-01 UTC"
[61] "2018-01-01 UTC" "2018-04-01 UTC" "2018-07-01 UTC" "2018-10-01 UTC"
[65] "2019-01-01 UTC" "2019-04-01 UTC" "2019-07-01 UTC" "2019-10-01 UTC"
[69] "2020-01-01 UTC" "2020-04-01 UTC" "2020-07-01 UTC" "2020-10-01 UTC"
[73] "2021-01-01 UTC" "2021-04-01 UTC"