Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Mekhala Kumar
August 18, 2022
The dataset being used is FedFundsRate.
spec_tbl_df [904 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ Year : num [1:904] 1954 1954 1954 1954 1954 ...
$ Month : num [1:904] 7 8 9 10 11 12 1 2 3 4 ...
$ Day : num [1:904] 1 1 1 1 1 1 1 1 1 1 ...
$ Federal Funds Target Rate : num [1:904] NA NA NA NA NA NA NA NA NA NA ...
$ Federal Funds Upper Target : num [1:904] NA NA NA NA NA NA NA NA NA NA ...
$ Federal Funds Lower Target : num [1:904] NA NA NA NA NA NA NA NA NA NA ...
$ Effective Federal Funds Rate: num [1:904] 0.8 1.22 1.06 0.85 0.83 1.28 1.39 1.29 1.35 1.43 ...
$ Real GDP (Percent Change) : num [1:904] 4.6 NA NA 8 NA NA 11.9 NA NA 6.7 ...
$ Unemployment Rate : num [1:904] 5.8 6 6.1 5.7 5.3 5 4.9 4.7 4.6 4.7 ...
$ Inflation Rate : num [1:904] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. Year = col_double(),
.. Month = col_double(),
.. Day = col_double(),
.. `Federal Funds Target Rate` = col_double(),
.. `Federal Funds Upper Target` = col_double(),
.. `Federal Funds Lower Target` = col_double(),
.. `Effective Federal Funds Rate` = col_double(),
.. `Real GDP (Percent Change)` = col_double(),
.. `Unemployment Rate` = col_double(),
.. `Inflation Rate` = col_double()
.. )
- attr(*, "problems")=<externalptr>
[1] 904 10
Name | FedFundsRate |
Number of rows | 904 |
Number of columns | 10 |
_______________________ | |
Column type frequency: | |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Year | 0 | 1.00 | 1986.68 | 17.17 | 1954.00 | 1973.00 | 1987.50 | 2001.00 | 2017.00 | ▅▆▇▇▆ |
Month | 0 | 1.00 | 6.60 | 3.47 | 1.00 | 4.00 | 7.00 | 10.00 | 12.00 | ▇▅▅▅▇ |
Day | 0 | 1.00 | 3.60 | 6.79 | 1.00 | 1.00 | 1.00 | 1.00 | 31.00 | ▇▁▁▁▁ |
Federal Funds Target Rate | 442 | 0.51 | 5.66 | 2.55 | 1.00 | 3.75 | 5.50 | 7.75 | 11.50 | ▅▅▇▅▂ |
Federal Funds Upper Target | 801 | 0.11 | 0.31 | 0.14 | 0.25 | 0.25 | 0.25 | 0.25 | 1.00 | ▇▁▁▁▁ |
Federal Funds Lower Target | 801 | 0.11 | 0.06 | 0.14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.75 | ▇▁▁▁▁ |
Effective Federal Funds Rate | 152 | 0.83 | 4.91 | 3.61 | 0.07 | 2.43 | 4.70 | 6.58 | 19.10 | ▇▇▃▁▁ |
Real GDP (Percent Change) | 654 | 0.28 | 3.14 | 3.60 | -10.00 | 1.40 | 3.10 | 4.88 | 16.50 | ▁▂▇▂▁ |
Unemployment Rate | 152 | 0.83 | 5.98 | 1.57 | 3.40 | 4.90 | 5.70 | 7.00 | 10.80 | ▅▇▅▁▁ |
Inflation Rate | 194 | 0.79 | 3.73 | 2.57 | 0.60 | 2.00 | 2.80 | 4.70 | 13.60 | ▇▃▁▁▁ |
# A tibble: 6 × 10
Year Month Day Federal Fu…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
# … with abbreviated variable names ¹`Federal Funds Target Rate`,
# ²`Federal Funds Upper Target`, ³`Federal Funds Lower Target`,
# ⁴`Effective Federal Funds Rate`, ⁵`Real GDP (Percent Change)`,
# ⁶`Unemployment Rate`, ⁷`Inflation Rate`
The data consists of the target interest rates as well as details of the GDP, unemployment rate and inflation rate between 1954-2017. It has 10 variables and 904 observations. However, a few variables such as the Federal Funds Upper Target, Federal Funds Lower Target and Real GDP, have more than half of the total observations missing.
The data is not already tidy. The observations are present as columns and values are in multiple columns. Moreover, the date is present in three columns and the three need to be recoded into one variable.
There are 904 rows(n), 10 variables(k) and 7 variables will be used to identify a case. However, since the date variables will be consolidated into one variable, we can consider there to be 8 variables while conducting the sanity check. The expected number of rows are 6328 and expected number of columns are 3.
The time variables are not correctly coded as dates, hence a new column with the proper date format was created. Moreover, the remaining columns with the data about the federal funds target rates, unemployment rates, inflation rates and GDP were pivoted to make a taller dataset. All of the remaining columns were added as one case since they represented different types of rates and their values were stored in a separate column. It was found that the dimensions are 6328*3 which was the same found during the sanity check. A glimpse of the rearranged dataset has been provided.
library(lubridate)
library(stringr)
library(tidyverse)
FedFundsRate$Date <- str_c(FedFundsRate$Year,"-",FedFundsRate$Month,"-",FedFundsRate$Day)%>%ymd()%>%as.Date()
FedFundsRate=subset(FedFundsRate,select=-c(1,2,3))
FedFundsRate<-FedFundsRate%>%select(Date,everything())
FedFundsRate<-pivot_longer(FedFundsRate, 2:8, names_to = "Rates", values_to = "Value")
dim(FedFundsRate)
[1] 6328 3
# A tibble: 6 × 3
Date Rates Value
<date> <chr> <dbl>
1 1954-07-01 Federal Funds Target Rate NA
2 1954-07-01 Federal Funds Upper Target NA
3 1954-07-01 Federal Funds Lower Target NA
4 1954-07-01 Effective Federal Funds Rate 0.8
5 1954-07-01 Real GDP (Percent Change) 4.6
6 1954-07-01 Unemployment Rate 5.8
---
title: "Challenge 4"
author: "Mekhala Kumar"
description: "More data wrangling: pivoting"
date: "08/18/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- FedFundsRate
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Reading in data
The dataset being used is FedFundsRate.
```{r}
library(readr)
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
str(FedFundsRate)
dim(FedFundsRate)
library(skimr)
skim(FedFundsRate)
head(FedFundsRate)
```
## Description of the data
The data consists of the target interest rates as well as details of the GDP, unemployment rate and inflation rate between 1954-2017. It has 10 variables and 904 observations. However, a few variables such as the Federal Funds Upper Target, Federal Funds Lower Target and Real GDP, have more than half of the total observations missing.
## Tidy Data
The data is not already tidy. The observations are present as columns and values are in multiple columns. Moreover, the date is present in three columns and the three need to be recoded into one variable.
There are 904 rows(n), 10 variables(k) and 7 variables will be used to identify a case. However, since the date variables will be consolidated into one variable, we can consider there to be 8 variables while conducting the sanity check. The expected number of rows are 6328 and expected number of columns are 3.
```{r}
rows=(904)*(8-1)
print("Expected number of rows:")
rows
col=(8-7)+2
print("Expected number of columns:")
col
```
## Identification of variables that need to be mutated
The time variables are not correctly coded as dates, hence a new column with the proper date format was created. Moreover, the remaining columns with the data about the federal funds target rates, unemployment rates, inflation rates and GDP were pivoted to make a taller dataset. All of the remaining columns were added as one case since they represented different types of rates and their values were stored in a separate column. It was found that the dimensions are 6328\*3 which was the same found during the sanity check. A glimpse of the rearranged dataset has been provided.
```{r}
library(lubridate)
library(stringr)
library(tidyverse)
FedFundsRate$Date <- str_c(FedFundsRate$Year,"-",FedFundsRate$Month,"-",FedFundsRate$Day)%>%ymd()%>%as.Date()
FedFundsRate=subset(FedFundsRate,select=-c(1,2,3))
FedFundsRate<-FedFundsRate%>%select(Date,everything())
FedFundsRate<-pivot_longer(FedFundsRate, 2:8, names_to = "Rates", values_to = "Value")
dim(FedFundsRate)
head(FedFundsRate)
```