Challenge 4 Instructions

challenge_4

mutate

More data wrangling: pivoting

Author

Nayan Jani

Published

August 21, 2022

Code

library(tidyverse)
library(summarytools)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code

FED <-read_csv("_data/FedFundsRate.csv",
                        show_col_types = FALSE, col_names = c("Year", "Month",
                               "Day", "fed_funds_target_rate",
                              "fed_funds_upper_target","fed_funds_lower_target","effective_fed_funds_rate",
                              "real_gdp_percent_change", "unemployment_rate", "inflation_rate"), skip =1)
FED

Code

colnames(FED)

 [1] "Year"                     "Month"                   
 [3] "Day"                      "fed_funds_target_rate"   
 [5] "fed_funds_upper_target"   "fed_funds_lower_target"  
 [7] "effective_fed_funds_rate" "real_gdp_percent_change" 
 [9] "unemployment_rate"        "inflation_rate"

Code

print(dfSummary(FED, varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

FED

Dimensions: 904 x 10
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Missing

Year [numeric]

Mean (sd) : 1986.7 (17.2)

min ≤ med ≤ max:

1954 ≤ 1987.5 ≤ 2017

IQR (CV) : 28 (0)

64 distinct values

0 (0.0%)

Month [numeric]

Mean (sd) : 6.6 (3.5)

min ≤ med ≤ max:

1 ≤ 7 ≤ 12

IQR (CV) : 6 (0.5)

12 distinct values

0 (0.0%)

Day [numeric]

Mean (sd) : 3.6 (6.8)

min ≤ med ≤ max:

1 ≤ 1 ≤ 31

IQR (CV) : 0 (1.9)

29 distinct values

0 (0.0%)

fed_funds_target_rate [numeric]

Mean (sd) : 5.7 (2.6)

min ≤ med ≤ max:

1 ≤ 5.5 ≤ 11.5

IQR (CV) : 4 (0.5)

63 distinct values

442 (48.9%)

fed_funds_upper_target [numeric]

Mean (sd) : 0.3 (0.1)

min ≤ med ≤ max:

0.2 ≤ 0.2 ≤ 1

IQR (CV) : 0 (0.5)

4 distinct values

801 (88.6%)

fed_funds_lower_target [numeric]

Mean (sd) : 0.1 (0.1)

min ≤ med ≤ max:

0 ≤ 0 ≤ 0.8

IQR (CV) : 0 (2.4)

4 distinct values

801 (88.6%)

effective_fed_funds_rate [numeric]

Mean (sd) : 4.9 (3.6)

min ≤ med ≤ max:

0.1 ≤ 4.7 ≤ 19.1

IQR (CV) : 4.2 (0.7)

466 distinct values

152 (16.8%)

real_gdp_percent_change [numeric]

Mean (sd) : 3.1 (3.6)

min ≤ med ≤ max:

-10 ≤ 3.1 ≤ 16.5

IQR (CV) : 3.5 (1.1)

113 distinct values

654 (72.3%)

unemployment_rate [numeric]

Mean (sd) : 6 (1.6)

min ≤ med ≤ max:

3.4 ≤ 5.7 ≤ 10.8

IQR (CV) : 2.1 (0.3)

71 distinct values

152 (16.8%)

inflation_rate [numeric]

Mean (sd) : 3.7 (2.6)

min ≤ med ≤ max:

0.6 ≤ 2.8 ≤ 13.6

IQR (CV) : 2.7 (0.7)

106 distinct values

194 (21.5%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-08-28

Briefly describe the data

After reading in the data, I can see that each case is YMD rate value. The rate represents the Fund Rate type that is being assigned. The Value is the actual rate with the unit being percent. I noticed that the names of the columns with multiple words in them were not formatted correctly so that I can pivot them. I renamed the columns using underscores so that they are in the correct form to pivot. Once I created the new names for the columns I realized that the new column names were inserted into the first row. To remove them, I used the skip parameter and set it to 1 so that my columns names do not repeat. Using summary tools, I can see that all columns that include a rate value have missing values. The columns with most missing values are fed_funds_upper_target and fed_funds_lower_target , which suggests that for those months there was no fund intervals for the fed_funds_target_rate. In fact, there was no intervals created until filter(FED,Year == 2008, Month ==12, Day==16). This means that there is a lot of unfilled data that I will have to handle accordingly when tidying.

Tidy Data

Here I create a new variable FED_long which is a long pivot of all the columns that include a rate value. I need to set up the data in the form YMD rate value, where each data instance has a Date, rate type and value. I use the parameter values_drop_na to remove any missing value from the dataset.

Code

FED_long <- FED%>%
  pivot_longer( col = c(fed_funds_target_rate,
                              fed_funds_upper_target,fed_funds_lower_target,effective_fed_funds_rate,
                              real_gdp_percent_change, unemployment_rate, inflation_rate),
                 names_to="type",
                 values_to = "value",
                 values_drop_na = TRUE)


FED_long

One issue I see is that not all the data is being shown in the tibble because here the dataset ends at 1980 when it should continue to 2017. I believe it is something to do with the dimensions of the code chunk since I can still filter out the missing years.

Identify variables that need to be mutated

Now I have to mutate the dates so that they are in the correct form. I now want one column that includes Year, month, Day in form YYYY-MM-DD. To do this, I used mutate() to create the new column called Date, and within that I use the function make_datetime to format the date. I then select the three columns that we need to create our case: Date, type and value.

Code

nrow(FED_long)

[1] 3132

Code

FED_long%>%
  count(Year)

Code

FED_long%>%
  filter(Year > 1980)

Code

FED_long%>%
  filter(Year > 1998)

Code

FED_t <- FED_long %>% 
  mutate(Date = make_datetime(Year, Month, Day)) %>%
  select(Date,type,value)



FED_t