Challenge 4 Instructions

challenge_4
mutate
More data wrangling: pivoting
Author

Nayan Jani

Published

August 21, 2022

Code
library(tidyverse)
library(summarytools)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code
FED <-read_csv("_data/FedFundsRate.csv",
                        show_col_types = FALSE, col_names = c("Year", "Month",
                               "Day", "fed_funds_target_rate",
                              "fed_funds_upper_target","fed_funds_lower_target","effective_fed_funds_rate",
                              "real_gdp_percent_change", "unemployment_rate", "inflation_rate"), skip =1)
FED
Code
colnames(FED)
 [1] "Year"                     "Month"                   
 [3] "Day"                      "fed_funds_target_rate"   
 [5] "fed_funds_upper_target"   "fed_funds_lower_target"  
 [7] "effective_fed_funds_rate" "real_gdp_percent_change" 
 [9] "unemployment_rate"        "inflation_rate"          
Code
print(dfSummary(FED, varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

FED

Dimensions: 904 x 10
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
Year [numeric]
Mean (sd) : 1986.7 (17.2)
min ≤ med ≤ max:
1954 ≤ 1987.5 ≤ 2017
IQR (CV) : 28 (0)
64 distinct values 0 (0.0%)
Month [numeric]
Mean (sd) : 6.6 (3.5)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 6 (0.5)
12 distinct values 0 (0.0%)
Day [numeric]
Mean (sd) : 3.6 (6.8)
min ≤ med ≤ max:
1 ≤ 1 ≤ 31
IQR (CV) : 0 (1.9)
29 distinct values 0 (0.0%)
fed_funds_target_rate [numeric]
Mean (sd) : 5.7 (2.6)
min ≤ med ≤ max:
1 ≤ 5.5 ≤ 11.5
IQR (CV) : 4 (0.5)
63 distinct values 442 (48.9%)
fed_funds_upper_target [numeric]
Mean (sd) : 0.3 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.2 ≤ 1
IQR (CV) : 0 (0.5)
4 distinct values 801 (88.6%)
fed_funds_lower_target [numeric]
Mean (sd) : 0.1 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.8
IQR (CV) : 0 (2.4)
4 distinct values 801 (88.6%)
effective_fed_funds_rate [numeric]
Mean (sd) : 4.9 (3.6)
min ≤ med ≤ max:
0.1 ≤ 4.7 ≤ 19.1
IQR (CV) : 4.2 (0.7)
466 distinct values 152 (16.8%)
real_gdp_percent_change [numeric]
Mean (sd) : 3.1 (3.6)
min ≤ med ≤ max:
-10 ≤ 3.1 ≤ 16.5
IQR (CV) : 3.5 (1.1)
113 distinct values 654 (72.3%)
unemployment_rate [numeric]
Mean (sd) : 6 (1.6)
min ≤ med ≤ max:
3.4 ≤ 5.7 ≤ 10.8
IQR (CV) : 2.1 (0.3)
71 distinct values 152 (16.8%)
inflation_rate [numeric]
Mean (sd) : 3.7 (2.6)
min ≤ med ≤ max:
0.6 ≤ 2.8 ≤ 13.6
IQR (CV) : 2.7 (0.7)
106 distinct values 194 (21.5%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-08-28

Briefly describe the data

After reading in the data, I can see that each case is YMD rate value. The rate represents the Fund Rate type that is being assigned. The Value is the actual rate with the unit being percent. I noticed that the names of the columns with multiple words in them were not formatted correctly so that I can pivot them. I renamed the columns using underscores so that they are in the correct form to pivot. Once I created the new names for the columns I realized that the new column names were inserted into the first row. To remove them, I used the skip parameter and set it to 1 so that my columns names do not repeat. Using summary tools, I can see that all columns that include a rate value have missing values. The columns with most missing values are fed_funds_upper_target and fed_funds_lower_target , which suggests that for those months there was no fund intervals for the fed_funds_target_rate. In fact, there was no intervals created until filter(FED,Year == 2008, Month ==12, Day==16). This means that there is a lot of unfilled data that I will have to handle accordingly when tidying.

Tidy Data

Here I create a new variable FED_long which is a long pivot of all the columns that include a rate value. I need to set up the data in the form YMD rate value, where each data instance has a Date, rate type and value. I use the parameter values_drop_na to remove any missing value from the dataset.

Code
FED_long <- FED%>%
  pivot_longer( col = c(fed_funds_target_rate,
                              fed_funds_upper_target,fed_funds_lower_target,effective_fed_funds_rate,
                              real_gdp_percent_change, unemployment_rate, inflation_rate),
                 names_to="type",
                 values_to = "value",
                 values_drop_na = TRUE)


FED_long 

One issue I see is that not all the data is being shown in the tibble because here the dataset ends at 1980 when it should continue to 2017. I believe it is something to do with the dimensions of the code chunk since I can still filter out the missing years.

Identify variables that need to be mutated

Now I have to mutate the dates so that they are in the correct form. I now want one column that includes Year, month, Day in form YYYY-MM-DD. To do this, I used mutate() to create the new column called Date, and within that I use the function make_datetime to format the date. I then select the three columns that we need to create our case: Date, type and value.

Code
nrow(FED_long)
[1] 3132
Code
FED_long%>%
  count(Year)
Code
FED_long%>%
  filter(Year > 1980)
Code
FED_long%>%
  filter(Year > 1998)
Code
FED_t <- FED_long %>% 
  mutate(Date = make_datetime(Year, Month, Day)) %>%
  select(Date,type,value)



FED_t