Challenge_4

challenge_4

fed_rates

More data wrangling: pivoting

Author

Xinyang Mao

Published

April 3, 2023

Code

library(tidyverse)
library(lubridate)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
tidy data (as needed, including sanity checks)
identify variables that need to be mutated
mutate variables and sanity check all mutations

Read in data

FedFundsRate.csv⭐⭐⭐

Code

fedRateData <- read.csv("_data/FedFundsRate.csv")
head(fedRateData)

  Year Month Day Federal.Funds.Target.Rate Federal.Funds.Upper.Target
1 1954     7   1                        NA                         NA
2 1954     8   1                        NA                         NA
3 1954     9   1                        NA                         NA
4 1954    10   1                        NA                         NA
5 1954    11   1                        NA                         NA
6 1954    12   1                        NA                         NA
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
1                         NA                         0.80
2                         NA                         1.22
3                         NA                         1.06
4                         NA                         0.85
5                         NA                         0.83
6                         NA                         1.28
  Real.GDP..Percent.Change. Unemployment.Rate Inflation.Rate
1                       4.6               5.8             NA
2                        NA               6.0             NA
3                        NA               6.1             NA
4                       8.0               5.7             NA
5                        NA               5.3             NA
6                        NA               5.0             NA

Briefly describe the data

Code

print(summarytools::dfSummary(fedRateData,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

fedRateData

Dimensions: 904 x 10
Duplicates: 0

Year [integer]

Mean (sd) : 1986.7 (17.2)
min ≤ med ≤ max:
1954 ≤ 1987.5 ≤ 2017
IQR (CV) : 28 (0)

64 distinct values

0 (0.0%)

Month [integer]

Mean (sd) : 6.6 (3.5)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 6 (0.5)

12 distinct values

0 (0.0%)

Day [integer]

Mean (sd) : 3.6 (6.8)
min ≤ med ≤ max:
1 ≤ 1 ≤ 31
IQR (CV) : 0 (1.9)

29 distinct values

0 (0.0%)

Federal.Funds.Target.Rate [numeric]

Mean (sd) : 5.7 (2.6)
min ≤ med ≤ max:
1 ≤ 5.5 ≤ 11.5
IQR (CV) : 4 (0.5)

63 distinct values

442 (48.9%)

Federal.Funds.Upper.Target [numeric]

Mean (sd) : 0.3 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.2 ≤ 1
IQR (CV) : 0 (0.5)

4 distinct values

801 (88.6%)

Federal.Funds.Lower.Target [numeric]

Mean (sd) : 0.1 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.8
IQR (CV) : 0 (2.4)

4 distinct values

801 (88.6%)

Effective.Federal.Funds.Rate [numeric]

Mean (sd) : 4.9 (3.6)
min ≤ med ≤ max:
0.1 ≤ 4.7 ≤ 19.1
IQR (CV) : 4.2 (0.7)

466 distinct values

152 (16.8%)

Real.GDP..Percent.Change. [numeric]

Mean (sd) : 3.1 (3.6)
min ≤ med ≤ max:
-10 ≤ 3.1 ≤ 16.5
IQR (CV) : 3.5 (1.1)

113 distinct values

654 (72.3%)

Unemployment.Rate [numeric]

Mean (sd) : 6 (1.6)
min ≤ med ≤ max:
3.4 ≤ 5.7 ≤ 10.8
IQR (CV) : 2.1 (0.3)

71 distinct values

152 (16.8%)

Inflation.Rate [numeric]

Mean (sd) : 3.7 (2.6)
min ≤ med ≤ max:
0.6 ≤ 2.8 ≤ 13.6
IQR (CV) : 2.7 (0.7)

106 distinct values

194 (21.5%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-03

Modify date

As we can see,there are 3 separate date columns: Year,Month,Day.We can use add a extra column named Date to combine them as a new column then delete the original 3 separate columns.

Firstly,we use mutate() and make_date() functions to create a new column:

Code

fedRateRaw <- fedRateData%>%
  mutate(Date = make_date(year = `Year`,month = `Month`,day = `Day`))

summary(fedRateRaw$Date)

        Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
"1954-07-01" "1973-04-23" "1987-12-16" "1987-02-25" "2001-06-07" "2017-03-16"

Delete previous columns

Delete previous separate 3 date columns. subset() used to reserve the useful columns,and store the columns names which need to delete in a character type vector.

Code

fedRateRaw <- fedRateRaw %>%
  subset(select = -c(`Year`,`Month`,`Day`))
colnames(fedRateRaw)

[1] "Federal.Funds.Target.Rate"    "Federal.Funds.Upper.Target"  
[3] "Federal.Funds.Lower.Target"   "Effective.Federal.Funds.Rate"
[5] "Real.GDP..Percent.Change."    "Unemployment.Rate"           
[7] "Inflation.Rate"               "Date"

Sanity check

Use the colSums() function and the is.na() function: this method calculates the number of missing values in each column and returns a vector of numbers where each element represents the number of missing values in the corresponding column

Code

colHasNa <- colSums(is.na(fedRateRaw)) > 0

colHasNa

   Federal.Funds.Target.Rate   Federal.Funds.Upper.Target 
                        TRUE                         TRUE 
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate 
                        TRUE                         TRUE 
   Real.GDP..Percent.Change.            Unemployment.Rate 
                        TRUE                         TRUE 
              Inflation.Rate                         Date 
                        TRUE                        FALSE

An element in the colHasNa vector is TRUE for a missing value in the corresponding column, and FALSE for no missing value in the corresponding column.

Use average value to fill in the missing values

According to the check result above,we can find there are lots of missing value and all them are numeric type value. So I use the average value of every column except Date column to fill the missing cell. Mainly use for loop and mean() function to calculate mean and fill,ifelse() and is.na() to determine.

Code

colsToFill <- c(1:7)
for (i in colsToFill) {
  avg <- mean(fedRateRaw[,i],na.rm = TRUE)
  fedRateRaw[,i] <- ifelse(is.na(fedRateRaw[,i]),avg,fedRateRaw[,i])
}

colCheckNa <- colSums(is.na(fedRateRaw)) > 0

colCheckNa

   Federal.Funds.Target.Rate   Federal.Funds.Upper.Target 
                       FALSE                        FALSE 
  Federal.Funds.Lower.Target Effective.Federal.Funds.Rate 
                       FALSE                        FALSE 
   Real.GDP..Percent.Change.            Unemployment.Rate 
                       FALSE                        FALSE 
              Inflation.Rate                         Date 
                       FALSE                        FALSE

Now we check again,all of the missing values disappered!