Challenge 4 Submission

challenge_4
More data wrangling: pivoting
Author

Nick Boonstra

Published

August 21, 2022

Introduction

Today’s challenge introduces the mutate() function as an additional tool in the data-wrangling process. I will be using the “FedFundsRate” data set.

Code
library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Code
fed <- read_csv(
  file = "_data/FedFundsRate.csv",
  skip = 1,
  col_names = c("year","month","day","fedfunds_target","fedfunds_upper","fedfunds_lower",
                "fedfunds_eff","gdp_change","unemploy_rate","inflation_rate")
  )

fed
# A tibble: 904 × 10
    year month   day fedfunds_…¹ fedfu…² fedfu…³ fedfu…⁴ gdp_c…⁵ unemp…⁶ infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, and abbreviated variable names ¹​fedfunds_target,
#   ²​fedfunds_upper, ³​fedfunds_lower, ⁴​fedfunds_eff, ⁵​gdp_change,
#   ⁶​unemploy_rate, ⁷​inflation_rate
# ℹ Use `print(n = ...)` to see more rows

Briefly describe the data

This data set is not far from being tidy. Once the data set is tidy, there will be four columns: one for the date of the observation, one for the specific rate in question, one to describe what it is about that rate that is being observed, and one to provide the value of the given rate. When I read in the data, I renamed the columns with this tidying in mind.

Tidying and Wrangling

One inconvenience about this data set is that the year, month, and date of each observation each exist in their own columns, rather than there being one single column for the date. Fortunately, the lubridate package provides a simple work-around for this with the make_date() function.

When we pivot_longer() the data, we are going to take every column but the date (we will select() out the individual year/month/day columns once we’ve generated our date column) and pivot them into two columns: one for rate_type, and one for value. Then, we will take one further step and use the separate() function to split rate_type into two separate columns, one for each variable.

Code
fed <- fed %>% 
  mutate(date=make_date(year,month,day)) %>% 
  select(-c(year,month,day)) %>% 
  pivot_longer(
    cols=!date,
    names_to="rate_type",
    values_to="val"
  ) %>% 
  separate(col=rate_type,into=c("rate","type")) 

fed
# A tibble: 6,328 × 4
   date       rate      type     val
   <date>     <chr>     <chr>  <dbl>
 1 1954-07-01 fedfunds  target  NA  
 2 1954-07-01 fedfunds  upper   NA  
 3 1954-07-01 fedfunds  lower   NA  
 4 1954-07-01 fedfunds  eff      0.8
 5 1954-07-01 gdp       change   4.6
 6 1954-07-01 unemploy  rate     5.8
 7 1954-07-01 inflation rate    NA  
 8 1954-08-01 fedfunds  target  NA  
 9 1954-08-01 fedfunds  upper   NA  
10 1954-08-01 fedfunds  lower   NA  
# … with 6,318 more rows
# ℹ Use `print(n = ...)` to see more rows