Code
library(tidyverse)
library(summarytools)
library(lubridate)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Aleacia Messiah
October 5, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 904 × 10
Year Month Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
7 1955 1 1 NA NA NA 1.39 11.9 4.9 NA
8 1955 2 1 NA NA NA 1.29 NA 4.7 NA
9 1955 3 1 NA NA NA 1.35 NA 4.6 NA
10 1955 4 1 NA NA NA 1.43 6.7 4.7 NA
# … with 894 more rows, and abbreviated variable names
# ¹`Federal Funds Target Rate`, ²`Federal Funds Upper Target`,
# ³`Federal Funds Lower Target`, ⁴`Effective Federal Funds Rate`,
# ⁵`Real GDP (Percent Change)`, ⁶`Unemployment Rate`, ⁷`Inflation Rate`
Data Frame Summary
FedFundsRate
Dimensions: 904 x 10
Duplicates: 0
--------------------------------------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ------------------------------ --------------------------- --------------------- --------------------- ---------- ---------
1 Year Mean (sd) : 1986.7 (17.2) 64 distinct values : 904 0
[numeric] min < med < max: . : : : : . (100.0%) (0.0%)
1954 < 1987.5 < 2017 : : : : : : : : : :
IQR (CV) : 28 (0) : : : : : : : : : :
: : : : : : : : : :
2 Month Mean (sd) : 6.6 (3.5) 12 distinct values . : 904 0
[numeric] min < med < max: : : (100.0%) (0.0%)
1 < 7 < 12 : . . . . . . :
IQR (CV) : 6 (0.5) : : : : : : : : : :
: : : : : : : : : :
3 Day Mean (sd) : 3.6 (6.8) 29 distinct values : 904 0
[numeric] min < med < max: : (100.0%) (0.0%)
1 < 1 < 31 :
IQR (CV) : 0 (1.9) :
:
4 Federal Funds Target Rate Mean (sd) : 5.7 (2.6) 63 distinct values : 462 442
[numeric] min < med < max: : (51.1%) (48.9%)
1 < 5.5 < 11.5 . : .
IQR (CV) : 4 (0.5) : . . : : : : :
: : : : : : : : : .
5 Federal Funds Upper Target Mean (sd) : 0.3 (0.1) 4 distinct values IIIIIIIIIIIIIIII 103 801
[numeric] min < med < max: II (11.4%) (88.6%)
0.2 < 0.2 < 1
IQR (CV) : 0 (0.5)
6 Federal Funds Lower Target Mean (sd) : 0.1 (0.1) 4 distinct values IIIIIIIIIIIIIIII 103 801
[numeric] min < med < max: II (11.4%) (88.6%)
0 < 0 < 0.8
IQR (CV) : 0 (2.4)
7 Effective Federal Funds Rate Mean (sd) : 4.9 (3.6) 466 distinct values . : 752 152
[numeric] min < med < max: : : : (83.2%) (16.8%)
0.1 < 4.7 < 19.1 : : : .
IQR (CV) : 4.2 (0.7) : : : : :
: : : : : : . .
8 Real GDP (Percent Change) Mean (sd) : 3.1 (3.6) 113 distinct values : 250 654
[numeric] min < med < max: : (27.7%) (72.3%)
-10 < 3.1 < 16.5 :
IQR (CV) : 3.5 (1.1) : .
: : :
9 Unemployment Rate Mean (sd) : 6 (1.6) 71 distinct values : 752 152
[numeric] min < med < max: : . (83.2%) (16.8%)
3.4 < 5.7 < 10.8 . : : .
IQR (CV) : 2.1 (0.3) : : : : : :
: : : : : : . . . .
10 Inflation Rate Mean (sd) : 3.7 (2.6) 106 distinct values : 710 194
[numeric] min < med < max: . : (78.5%) (21.5%)
0.6 < 2.8 < 13.6 : : .
IQR (CV) : 2.7 (0.7) : : : .
: : : : : . .
--------------------------------------------------------------------------------------------------------------------------------
The FedFundsRate data has 904 observations with 10 variables (Year, Month, Day, Federal Funds Target Rate, Federal Funds Upper Target, Federal Funds Lower Target, Effective Federal Funds Rate, Real GDP (Percent Change), Unemployment Rate, and Inflation Rate). Each observations appear to be measuring the effective (current) federal funds rate, real GDP, unemployment rate, and inflation rate for at least the beginning and end of each month for the years 1954 through 2017, with a target, upper target, and/or lower target rate for each observation. It appears that several observations do not have values for some of the variables so those will need to be accounted for during analysis.
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
Thankfully this data is mostly tidy but we can see that the columns Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target are very similar and don’t need to be in their own separate columns. I will pivot the FedFundsRate dataset to make these grouped into one column called Targets. Currently, we have \(n = 904\) observations and \(k = 10\) variables so pivoting the data into a longer format would create \(n = 2,712\) rows and \(k = 9\) columns.
[1] 904
[1] 10
[1] 2712
[1] 9
# A tibble: 2,712 × 9
Year Month Day Effective Federal F…¹ Real …² Unemp…³ Infla…⁴ Targets Value
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 1954 7 1 0.8 4.6 5.8 NA Federa… NA
2 1954 7 1 0.8 4.6 5.8 NA Federa… NA
3 1954 7 1 0.8 4.6 5.8 NA Federa… NA
4 1954 8 1 1.22 NA 6 NA Federa… NA
5 1954 8 1 1.22 NA 6 NA Federa… NA
6 1954 8 1 1.22 NA 6 NA Federa… NA
7 1954 9 1 1.06 NA 6.1 NA Federa… NA
8 1954 9 1 1.06 NA 6.1 NA Federa… NA
9 1954 9 1 1.06 NA 6.1 NA Federa… NA
10 1954 10 1 0.85 8 5.7 NA Federa… NA
# … with 2,702 more rows, and abbreviated variable names
# ¹`Effective Federal Funds Rate`, ²`Real GDP (Percent Change)`,
# ³`Unemployment Rate`, ⁴`Inflation Rate`
Any additional comments?
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
We see that the year, month, and day are separated into their own columns so they need to be condensed into one column. Once this column is created, we can remove the Year, Month, and Day columns from the dataset.
# A tibble: 2,712 × 7
Effective Federal Funds Ra…¹ Real …² Unemp…³ Infla…⁴ Targets Value Date
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <date>
1 0.8 4.6 5.8 NA Federa… NA 1954-07-01
2 0.8 4.6 5.8 NA Federa… NA 1954-07-01
3 0.8 4.6 5.8 NA Federa… NA 1954-07-01
4 1.22 NA 6 NA Federa… NA 1954-08-01
5 1.22 NA 6 NA Federa… NA 1954-08-01
6 1.22 NA 6 NA Federa… NA 1954-08-01
7 1.06 NA 6.1 NA Federa… NA 1954-09-01
8 1.06 NA 6.1 NA Federa… NA 1954-09-01
9 1.06 NA 6.1 NA Federa… NA 1954-09-01
10 0.85 8 5.7 NA Federa… NA 1954-10-01
# … with 2,702 more rows, and abbreviated variable names
# ¹`Effective Federal Funds Rate`, ²`Real GDP (Percent Change)`,
# ³`Unemployment Rate`, ⁴`Inflation Rate`
Any additional comments?
---
title: "Challenge 4"
author: "Aleacia Messiah"
desription: "More data wrangling: pivoting"
date: "10/05/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- Aleacia Messiah
- fed_rates
- tidyverse
- summarytools
- lubridate
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(summarytools)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
# read in FedFundsRate dataset
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
# view FedFundsRate dataset
FedFundsRate
# view a summary of the dataset
dfSummary(FedFundsRate)
```
### Briefly describe the data
The FedFundsRate data has 904 observations with 10 variables (Year, Month, Day, Federal Funds Target Rate, Federal Funds Upper Target, Federal Funds Lower Target, Effective Federal Funds Rate, Real GDP (Percent Change), Unemployment Rate, and Inflation Rate). Each observations appear to be measuring the effective (current) federal funds rate, real GDP, unemployment rate, and inflation rate for at least the beginning and end of each month for the years 1954 through 2017, with a target, upper target, and/or lower target rate for each observation. It appears that several observations do not have values for some of the variables so those will need to be accounted for during analysis.
## Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
Thankfully this data is mostly tidy but we can see that the columns Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target are very similar and don't need to be in their own separate columns. I will pivot the FedFundsRate dataset to make these grouped into one column called Targets. Currently, we have $n = 904$ observations and $k = 10$ variables so pivoting the data into a longer format would create $n = 2,712$ rows and $k = 9$ columns.
```{r}
# existing rows/observations
nrow(FedFundsRate)
# existing columns/variables
ncol(FedFundsRate)
# expected rows/observations
nrow(FedFundsRate) * (ncol(FedFundsRate)-7)
# expected columns/variables
ncol(FedFundsRate) - 1
# pivot longer FedFundsRate
FedFundsRate <- pivot_longer(FedFundsRate, cols = c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target"), names_to = "Targets", values_to = "Value")
# view the pivoted dataset
FedFundsRate
```
Any additional comments?
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
We see that the year, month, and day are separated into their own columns so they need to be condensed into one column. Once this column is created, we can remove the Year, Month, and Day columns from the dataset.
```{r}
# create dates by combining Year, Month, and Day columns and delete those three columns
FedFundsRate <- FedFundsRate %>%
mutate(Date = make_date(Year, Month, Day)) %>%
select(!contains(c("Year", "Month", "Day")))
# view the dataset
FedFundsRate
```
Any additional comments?
::: callout-note
## Question
Since there are missing values in this dataset for the target rates since in earlier years the Federal Funds Target Rate was used while in later years the Federal Funds Upper and Lower Target Rates were used, would it be easier to take out the *NA*s for the Federal Funds Target Rate if the Upper and Lower Target Rates were used those years and vice versa? I did not see any observations where the Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target rates all had values.
:::