Challenge 8 Solutions

challenge_8
railroads
snl
faostat
debt
Joining Data
Author

Moira Chiong

Published

June 27, 2023

library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in multiple data sets, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. mutate variables as needed (including sanity checks)
  4. join two or more data sets and analyze some aspect of the joined data

(be sure to only include the category tags for the data you use!)

Read in data

setwd("C:/Users/chion/OneDrive/Desktop/DACSS 601/DACSS_601_Summer2023_Sec1/posts")
library(readxl)
debt_in_trillions <- read_excel("_data/debt_in_trillions.xlsx")

library(readr)
FedFundsRate <- read_csv("_data/FedFundsRate.csv")

Briefly describe the data

summary(FedFundsRate)
      Year          Month             Day         Federal Funds Target Rate
 Min.   :1954   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000           
 1st Qu.:1973   1st Qu.: 4.000   1st Qu.: 1.000   1st Qu.: 3.750           
 Median :1988   Median : 7.000   Median : 1.000   Median : 5.500           
 Mean   :1987   Mean   : 6.598   Mean   : 3.598   Mean   : 5.658           
 3rd Qu.:2001   3rd Qu.:10.000   3rd Qu.: 1.000   3rd Qu.: 7.750           
 Max.   :2017   Max.   :12.000   Max.   :31.000   Max.   :11.500           
                                                  NA's   :442              
 Federal Funds Upper Target Federal Funds Lower Target
 Min.   :0.2500             Min.   :0.0000            
 1st Qu.:0.2500             1st Qu.:0.0000            
 Median :0.2500             Median :0.0000            
 Mean   :0.3083             Mean   :0.0583            
 3rd Qu.:0.2500             3rd Qu.:0.0000            
 Max.   :1.0000             Max.   :0.7500            
 NA's   :801                NA's   :801               
 Effective Federal Funds Rate Real GDP (Percent Change) Unemployment Rate
 Min.   : 0.070               Min.   :-10.000           Min.   : 3.400   
 1st Qu.: 2.428               1st Qu.:  1.400           1st Qu.: 4.900   
 Median : 4.700               Median :  3.100           Median : 5.700   
 Mean   : 4.911               Mean   :  3.138           Mean   : 5.979   
 3rd Qu.: 6.580               3rd Qu.:  4.875           3rd Qu.: 7.000   
 Max.   :19.100               Max.   : 16.500           Max.   :10.800   
 NA's   :152                  NA's   :654               NA's   :152      
 Inflation Rate  
 Min.   : 0.600  
 1st Qu.: 2.000  
 Median : 2.800  
 Mean   : 3.733  
 3rd Qu.: 4.700  
 Max.   :13.600  
 NA's   :194     
summary(debt_in_trillions)
 Year and Quarter      Mortgage       HE Revolving      Auto Loan     
 Length:74          Min.   : 4.942   Min.   :0.2420   Min.   :0.6220  
 Class :character   1st Qu.: 8.036   1st Qu.:0.4275   1st Qu.:0.7430  
 Mode  :character   Median : 8.412   Median :0.5165   Median :0.8145  
                    Mean   : 8.274   Mean   :0.5161   Mean   :0.9309  
                    3rd Qu.: 9.047   3rd Qu.:0.6172   3rd Qu.:1.1515  
                    Max.   :10.442   Max.   :0.7140   Max.   :1.4150  
  Credit Card      Student Loan        Other            Total       
 Min.   :0.6590   Min.   :0.2407   Min.   :0.2960   Min.   : 7.231  
 1st Qu.:0.6966   1st Qu.:0.5333   1st Qu.:0.3414   1st Qu.:11.311  
 Median :0.7375   Median :0.9088   Median :0.3921   Median :11.852  
 Mean   :0.7565   Mean   :0.9189   Mean   :0.3831   Mean   :11.779  
 3rd Qu.:0.8165   3rd Qu.:1.3022   3rd Qu.:0.4154   3rd Qu.:12.674  
 Max.   :0.9270   Max.   :1.5840   Max.   :0.4860   Max.   :14.957  

##Description of data

There are 904 observations with 8 variables in the federal funds rate data set. The data covers the years 1954-2017 and pertains to federal funds rates. The federal funds target rate is recorded monthly for 1954-2008, while the upper and lower targets cover 2008-2017. There is also macroeconomic data for real GDP (percent change), unemployment and inflation. The real GDP data is recorded for the first and forth quarters, while the unemployment and inflation data is recorded quarterly. The mean of the federal funds target rate is higher than than the median at 5.66 and 5.5 respectively, likely indicating a skewed right distribution. The federal funds rate ranges from 1 to 11.5. In addition, as expected, the federal funds upper target rate and lower target rate also indicate a skewed right distribution, with a mean of .3 and median of .25 for upper targets and mean of .06 and median of 0 for lower targets. In fact, all of the macroeconomic data is also skewed right, as expected, with means higher than the medians (See table below).

There are 74 observations and 8 variables recorded quarterly in the debt in trillions data set. The debt data does indicate as uniform distribution patterns, with the variables showing both skewed left (Mortgage, HE, Other, Total) and skewed right distributions(Auto Loan, Credit Card, Student Loan (see table below).

To join the two data sets, I needed to convert the existing dates to calendar dates to use them as the joining variable.

Tidy Data (as needed)

debt <- debt_in_trillions

#Convert quarters to dates
debt <- debt %>%
  mutate(Date=parse_date_time(`Year and Quarter`, orders="yq"))

# to simplify, remove year and quarters
debt <- debt %>%
   select(-contains("Year")) %>%
  # move Date to front
select(Date, 1:last_col(0))
# view data
debt
# A tibble: 74 x 8
   Date                Mortgage `HE Revolving` `Auto Loan` `Credit Card`
   <dttm>                 <dbl>          <dbl>       <dbl>         <dbl>
 1 2003-01-01 00:00:00     4.94          0.242       0.641         0.688
 2 2003-04-01 00:00:00     5.08          0.26        0.622         0.693
 3 2003-07-01 00:00:00     5.18          0.269       0.684         0.693
 4 2003-10-01 00:00:00     5.66          0.302       0.704         0.698
 5 2004-01-01 00:00:00     5.84          0.328       0.72          0.695
 6 2004-04-01 00:00:00     5.97          0.367       0.743         0.697
 7 2004-07-01 00:00:00     6.21          0.426       0.751         0.706
 8 2004-10-01 00:00:00     6.36          0.468       0.728         0.717
 9 2005-01-01 00:00:00     6.51          0.502       0.725         0.71 
10 2005-04-01 00:00:00     6.70          0.528       0.774         0.717
# i 64 more rows
# i 3 more variables: `Student Loan` <dbl>, Other <dbl>, Total <dbl>
fed <-FedFundsRate
#create calendar date column out of original columns
fed_calendar_date <- paste(fed$Year, fed$Month, fed$Day)
# mutate new column, create new calendar date column
fed <- fed %>%
  mutate(Date = as.Date(fed_calendar_date,format = "%Y %m %d"), .before=`Year`)
View()
Error in as.data.frame(x): argument "x" is missing, with no default
# filter out old date columns
fed <- fed[-c(2,3,4)]
fed
# A tibble: 904 x 8
   Date       `Federal Funds Target Rate` `Federal Funds Upper Target`
   <date>                           <dbl>                        <dbl>
 1 1954-07-01                          NA                           NA
 2 1954-08-01                          NA                           NA
 3 1954-09-01                          NA                           NA
 4 1954-10-01                          NA                           NA
 5 1954-11-01                          NA                           NA
 6 1954-12-01                          NA                           NA
 7 1955-01-01                          NA                           NA
 8 1955-02-01                          NA                           NA
 9 1955-03-01                          NA                           NA
10 1955-04-01                          NA                           NA
# i 894 more rows
# i 5 more variables: `Federal Funds Lower Target` <dbl>,
#   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
#   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>

Join Data

joined <- fed %>%
  right_join(debt, by=join_by(Date)) %>%
  rename(Other_Debt=Other,Total_Debt=Total)
joined
# A tibble: 74 x 15
   Date                `Federal Funds Target Rate` `Federal Funds Upper Target`
   <dttm>                                    <dbl>                        <dbl>
 1 2003-01-01 00:00:00                        1.25                           NA
 2 2003-04-01 00:00:00                        1.25                           NA
 3 2003-07-01 00:00:00                        1                              NA
 4 2003-10-01 00:00:00                        1                              NA
 5 2004-01-01 00:00:00                        1                              NA
 6 2004-04-01 00:00:00                        1                              NA
 7 2004-07-01 00:00:00                        1.25                           NA
 8 2004-10-01 00:00:00                        1.75                           NA
 9 2005-01-01 00:00:00                        2.25                           NA
10 2005-04-01 00:00:00                        2.75                           NA
# i 64 more rows
# i 12 more variables: `Federal Funds Lower Target` <dbl>,
#   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
#   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>, Mortgage <dbl>,
#   `HE Revolving` <dbl>, `Auto Loan` <dbl>, `Credit Card` <dbl>,
#   `Student Loan` <dbl>, Other_Debt <dbl>, Total_Debt <dbl>
# filter date so data is present for both sets
joined <- joined %>%
  filter(Date < '2017-01-01')
joined
# A tibble: 57 x 15
   Date                `Federal Funds Target Rate` `Federal Funds Upper Target`
   <dttm>                                    <dbl>                        <dbl>
 1 2003-01-01 00:00:00                        1.25                           NA
 2 2003-04-01 00:00:00                        1.25                           NA
 3 2003-07-01 00:00:00                        1                              NA
 4 2003-10-01 00:00:00                        1                              NA
 5 2004-01-01 00:00:00                        1                              NA
 6 2004-04-01 00:00:00                        1                              NA
 7 2004-07-01 00:00:00                        1.25                           NA
 8 2004-10-01 00:00:00                        1.75                           NA
 9 2005-01-01 00:00:00                        2.25                           NA
10 2005-04-01 00:00:00                        2.75                           NA
# i 47 more rows
# i 12 more variables: `Federal Funds Lower Target` <dbl>,
#   `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
#   `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>, Mortgage <dbl>,
#   `HE Revolving` <dbl>, `Auto Loan` <dbl>, `Credit Card` <dbl>,
#   `Student Loan` <dbl>, Other_Debt <dbl>, Total_Debt <dbl>
View(joined)