Read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
Tidy data (as needed, including sanity checks)
Identify variables that need to be mutated
Mutate variables and sanity check all mutations
Read in data
Code
#simply read in the data (untouched)library(readr)raw_FedFundsRate <-read_csv("_data/FedFundsRate.csv")raw_FedFundsRate
Briefly describe the data
This data set is reporting on federal fund rates as it relates to the real GDP (percent change), unemployment rate, and inflation rate from July 1954 to March 2017. According to Investopedia, the federal funds rate is “an interest rate set by the Federal Reserve to indirectly manage interest rates, inflation, and unemployment”. In this data set, we examine four types: target, upper target, lower target, and effective rate.
Code
#summary of data set statisticsprint(summarytools::dfSummary(raw_FedFundsRate,varnumbers =FALSE,plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE),method ='render',table.classes ='table-condensed')
Data Frame Summary
raw_FedFundsRate
Dimensions: 904 x 10
Duplicates: 0
Variable
Stats / Values
Freqs (% of Valid)
Graph
Missing
Year [numeric]
Mean (sd) : 1986.7 (17.2)
min ≤ med ≤ max:
1954 ≤ 1987.5 ≤ 2017
IQR (CV) : 28 (0)
64 distinct values
0 (0.0%)
Month [numeric]
Mean (sd) : 6.6 (3.5)
min ≤ med ≤ max:
1 ≤ 7 ≤ 12
IQR (CV) : 6 (0.5)
12 distinct values
0 (0.0%)
Day [numeric]
Mean (sd) : 3.6 (6.8)
min ≤ med ≤ max:
1 ≤ 1 ≤ 31
IQR (CV) : 0 (1.9)
29 distinct values
0 (0.0%)
Federal Funds Target Rate [numeric]
Mean (sd) : 5.7 (2.6)
min ≤ med ≤ max:
1 ≤ 5.5 ≤ 11.5
IQR (CV) : 4 (0.5)
63 distinct values
442 (48.9%)
Federal Funds Upper Target [numeric]
Mean (sd) : 0.3 (0.1)
min ≤ med ≤ max:
0.2 ≤ 0.2 ≤ 1
IQR (CV) : 0 (0.5)
4 distinct values
801 (88.6%)
Federal Funds Lower Target [numeric]
Mean (sd) : 0.1 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.8
IQR (CV) : 0 (2.4)
4 distinct values
801 (88.6%)
Effective Federal Funds Rate [numeric]
Mean (sd) : 4.9 (3.6)
min ≤ med ≤ max:
0.1 ≤ 4.7 ≤ 19.1
IQR (CV) : 4.2 (0.7)
466 distinct values
152 (16.8%)
Real GDP (Percent Change) [numeric]
Mean (sd) : 3.1 (3.6)
min ≤ med ≤ max:
-10 ≤ 3.1 ≤ 16.5
IQR (CV) : 3.5 (1.1)
113 distinct values
654 (72.3%)
Unemployment Rate [numeric]
Mean (sd) : 6 (1.6)
min ≤ med ≤ max:
3.4 ≤ 5.7 ≤ 10.8
IQR (CV) : 2.1 (0.3)
71 distinct values
152 (16.8%)
Inflation Rate [numeric]
Mean (sd) : 3.7 (2.6)
min ≤ med ≤ max:
0.6 ≤ 2.8 ≤ 13.6
IQR (CV) : 2.7 (0.7)
106 distinct values
194 (21.5%)
Generated by summarytools 1.0.1 (R version 4.2.2) 2023-03-22
Tidy Data
Noticeably, there are many missing values across the federal fund rates and macroeconomic columns. Perhaps this data was collected on a specific cadence.
To make this data set more comprehensive, I pivoted longer to have a column for federal fund types and macroeconomic conditions along with the respective values. I also programmed the tables to only show me rows with values so the reader is not sifting through missing data. For example: on July 1st, 1954, the effective federal funds rate was 0.8000, the real GDP was 4.6 and the unemployment rate was 5.8.
As a note, if I were to visualize this data as a ggplot, I would condense the federal fund rate and macroeconomic conditions as two columns, type and value. This way I could see all seven rates over time.
Code
#pivot longer of the federal fund rate columns, removed NA valuestidy_FedFundsRate <-pivot_longer(raw_FedFundsRate, col =c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate"),names_to="Federal Fund Type",values_to ="Federal Fund Rate",values_drop_na =TRUE)
Code
#pivot longer of the federal fund rate columns, removed NA values. For this section, I'd rather not have pivoted longer as it creates duplicate federal fund rate rows, but I didn't want all the NA values regardless.tidy_FedFundsRate <-pivot_longer(tidy_FedFundsRate, col =c("Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate"),names_to="Macroeconomic Condition",values_to ="Macroeconomic Rate",values_drop_na =TRUE)tidy_FedFundsRate
Identify & mutate variables
There are three variables that require mutation: year, month, and day. These variables can be mutated into one easily readable “date” column. Upon creating this new column, I removed the old variables and moved it to first position.
This data set of federal fund rates related to macroeconomic conditions from 1954-2017 is now compiled into a comprehensive table for analysis.
---title: "Challenge 4 - FedFundsRate Dataset"author: "Megan Galarneau"description: "More data wrangling: pivoting"date: "03/22/2023"format: html: df-print: paged toc: true code-fold: true code-copy: true code-tools: true css: "style.css"categories: - challenge_4 - fed_rates - Megan Galarneau---```{r}#| label: setup#| warning: false#| message: falselibrary(tidyverse)library(dplyr)library(lubridate)knitr::opts_chunk$set(echo =TRUE, warning=FALSE, message=FALSE)```## Challenge OverviewToday's challenge is to:1) Read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)2) Tidy data (as needed, including sanity checks)3) Identify variables that need to be mutated4) Mutate variables and sanity check all mutations## Read in data```{r}#simply read in the data (untouched)library(readr)raw_FedFundsRate <-read_csv("_data/FedFundsRate.csv")raw_FedFundsRate```### Briefly describe the dataThis data set is reporting on federal fund rates as it relates to the real GDP (percent change), unemployment rate, and inflation rate from July 1954 to March 2017. According to [Investopedia](https://www.investopedia.com/terms/f/federalfundsrate.asp), the federal funds rate is "an interest rate set by the Federal Reserve to indirectly manage interest rates, inflation, and unemployment". In this data set, we examine four types: target, upper target, lower target, and effective rate.```{r}#summary of data set statisticsprint(summarytools::dfSummary(raw_FedFundsRate,varnumbers =FALSE,plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE),method ='render',table.classes ='table-condensed')```## Tidy DataNoticeably, there are many missing values across the federal fund rates and macroeconomic columns. Perhaps this data was collected on a specific cadence.To make this data set more comprehensive, I pivoted longer to have a column for federal fund types and macroeconomic conditions along with the respective values. I also programmed the tables to only show me rows with values so the reader is not sifting through missing data. For example: on July 1st, 1954, the effective federal funds rate was 0.8000, the real GDP was 4.6 and the unemployment rate was 5.8.As a note, if I were to visualize this data as a ggplot, I would condense the federal fund rate and macroeconomic conditions as two columns, type and value. This way I could see all seven rates over time.```{r}#pivot longer of the federal fund rate columns, removed NA valuestidy_FedFundsRate <-pivot_longer(raw_FedFundsRate, col =c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target", "Effective Federal Funds Rate"),names_to="Federal Fund Type",values_to ="Federal Fund Rate",values_drop_na =TRUE)``````{r}#pivot longer of the federal fund rate columns, removed NA values. For this section, I'd rather not have pivoted longer as it creates duplicate federal fund rate rows, but I didn't want all the NA values regardless.tidy_FedFundsRate <-pivot_longer(tidy_FedFundsRate, col =c("Real GDP (Percent Change)", "Unemployment Rate", "Inflation Rate"),names_to="Macroeconomic Condition",values_to ="Macroeconomic Rate",values_drop_na =TRUE)tidy_FedFundsRate```## Identify & mutate variablesThere are three variables that require mutation: year, month, and day. These variables can be mutated into one easily readable "date" column. Upon creating this new column, I removed the old variables and moved it to first position.This data set of federal fund rates related to macroeconomic conditions from 1954-2017 is now compiled into a comprehensive table for analysis.```{r}tidy_FedFundsRate<-tidy_FedFundsRate%>%mutate(date =str_c(Year, Month, Day, sep="-"),date =ymd(date)) %>%select(-c(Year, Month, Day)) %>%relocate(date)tidy_FedFundsRate```