challenge_4
fed_rates
More data wrangling: pivoting
Author

Xinyang Mao

Published

April 3, 2023

Code
library(tidyverse)
library(lubridate)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

  • FedFundsRate.csv⭐⭐⭐
Code
fedRateData <- read.csv("_data/FedFundsRate.csv")
head(fedRateData)
  Year Month Day Federal.Funds.Target.Rate Federal.Funds.Upper.Target
1 1954     7   1                        NA                         NA
2 1954     8   1                        NA                         NA
3 1954     9   1                        NA                         NA
4 1954    10   1                        NA                         NA
5 1954    11   1                        NA                         NA
6 1954    12   1                        NA                         NA
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
1                         NA                         0.80
2                         NA                         1.22
3                         NA                         1.06
4                         NA                         0.85
5                         NA                         0.83
6                         NA                         1.28
  Real.GDP..Percent.Change. Unemployment.Rate Inflation.Rate
1                       4.6               5.8             NA
2                        NA               6.0             NA
3                        NA               6.1             NA
4                       8.0               5.7             NA
5                        NA               5.3             NA
6                        NA               5.0             NA

Briefly describe the data

Code
print(summarytools::dfSummary(fedRateData,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

fedRateData

Dimensions: 904 x 10
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
Year [integer]
Mean (sd) : 1986.7 (17.2)
min ≤ med ≤ max:
1954 ≤ 1987.5 ≤ 2017
IQR (CV) : 28 (0)
64 distinct values 0 (0.0%)
Month [integer]
Mean (sd) : 6.6 (3.5)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 6 (0.5)
12 distinct values 0 (0.0%)
Day [integer]
Mean (sd) : 3.6 (6.8)
min ≤ med ≤ max:
1 ≤ 1 ≤ 31
IQR (CV) : 0 (1.9)
29 distinct values 0 (0.0%)
Federal.Funds.Target.Rate [numeric]
Mean (sd) : 5.7 (2.6)
min ≤ med ≤ max:
1 ≤ 5.5 ≤ 11.5
IQR (CV) : 4 (0.5)
63 distinct values 442 (48.9%)
Federal.Funds.Upper.Target [numeric]
Mean (sd) : 0.3 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.2 ≤ 1
IQR (CV) : 0 (0.5)
4 distinct values 801 (88.6%)
Federal.Funds.Lower.Target [numeric]
Mean (sd) : 0.1 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.8
IQR (CV) : 0 (2.4)
4 distinct values 801 (88.6%)
Effective.Federal.Funds.Rate [numeric]
Mean (sd) : 4.9 (3.6)
min ≤ med ≤ max:
0.1 ≤ 4.7 ≤ 19.1
IQR (CV) : 4.2 (0.7)
466 distinct values 152 (16.8%)
Real.GDP..Percent.Change. [numeric]
Mean (sd) : 3.1 (3.6)
min ≤ med ≤ max:
-10 ≤ 3.1 ≤ 16.5
IQR (CV) : 3.5 (1.1)
113 distinct values 654 (72.3%)
Unemployment.Rate [numeric]
Mean (sd) : 6 (1.6)
min ≤ med ≤ max:
3.4 ≤ 5.7 ≤ 10.8
IQR (CV) : 2.1 (0.3)
71 distinct values 152 (16.8%)
Inflation.Rate [numeric]
Mean (sd) : 3.7 (2.6)
min ≤ med ≤ max:
0.6 ≤ 2.8 ≤ 13.6
IQR (CV) : 2.7 (0.7)
106 distinct values 194 (21.5%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-03

Modify date

As we can see,there are 3 separate date columns: Year,Month,Day.We can use add a extra column named Date to combine them as a new column then delete the original 3 separate columns.

Firstly,we use mutate() and make_date() functions to create a new column:

Code
fedRateRaw <- fedRateData%>%
  mutate(Date = make_date(year = `Year`,month = `Month`,day = `Day`))

summary(fedRateRaw$Date)
        Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
"1954-07-01" "1973-04-23" "1987-12-16" "1987-02-25" "2001-06-07" "2017-03-16" 

Delete previous columns

Delete previous separate 3 date columns. subset() used to reserve the useful columns,and store the columns names which need to delete in a character type vector.

Code
fedRateRaw <- fedRateRaw %>%
  subset(select = -c(`Year`,`Month`,`Day`))
colnames(fedRateRaw)
[1] "Federal.Funds.Target.Rate"    "Federal.Funds.Upper.Target"  
[3] "Federal.Funds.Lower.Target"   "Effective.Federal.Funds.Rate"
[5] "Real.GDP..Percent.Change."    "Unemployment.Rate"           
[7] "Inflation.Rate"               "Date"                        

Sanity check

Use the colSums() function and the is.na() function: this method calculates the number of missing values in each column and returns a vector of numbers where each element represents the number of missing values in the corresponding column

Code
colHasNa <- colSums(is.na(fedRateRaw)) > 0

colHasNa
   Federal.Funds.Target.Rate   Federal.Funds.Upper.Target 
                        TRUE                         TRUE 
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate 
                        TRUE                         TRUE 
   Real.GDP..Percent.Change.            Unemployment.Rate 
                        TRUE                         TRUE 
              Inflation.Rate                         Date 
                        TRUE                        FALSE 

An element in the colHasNa vector is TRUE for a missing value in the corresponding column, and FALSE for no missing value in the corresponding column.

Use average value to fill in the missing values

According to the check result above,we can find there are lots of missing value and all them are numeric type value. So I use the average value of every column except Date column to fill the missing cell. Mainly use for loop and mean() function to calculate mean and fill,ifelse() and is.na() to determine.

Code
colsToFill <- c(1:7)
for (i in colsToFill) {
  avg <- mean(fedRateRaw[,i],na.rm = TRUE)
  fedRateRaw[,i] <- ifelse(is.na(fedRateRaw[,i]),avg,fedRateRaw[,i])
}

colCheckNa <- colSums(is.na(fedRateRaw)) > 0

colCheckNa
   Federal.Funds.Target.Rate   Federal.Funds.Upper.Target 
                       FALSE                        FALSE 
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate 
                       FALSE                        FALSE 
   Real.GDP..Percent.Change.            Unemployment.Rate 
                       FALSE                        FALSE 
              Inflation.Rate                         Date 
                       FALSE                        FALSE 

Now we check again,all of the missing values disappered!