Code
library(tidyverse)
library(lubridate)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Nick Boonstra
August 21, 2022
Today’s challenge introduces the mutate()
function as an additional tool in the data-wrangling process. I will be using the “FedFundsRate” data set.
Today’s challenge is to:
# A tibble: 904 × 10
year month day fedfunds_…¹ fedfu…² fedfu…³ fedfu…⁴ gdp_c…⁵ unemp…⁶ infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
7 1955 1 1 NA NA NA 1.39 11.9 4.9 NA
8 1955 2 1 NA NA NA 1.29 NA 4.7 NA
9 1955 3 1 NA NA NA 1.35 NA 4.6 NA
10 1955 4 1 NA NA NA 1.43 6.7 4.7 NA
# … with 894 more rows, and abbreviated variable names ¹fedfunds_target,
# ²fedfunds_upper, ³fedfunds_lower, ⁴fedfunds_eff, ⁵gdp_change,
# ⁶unemploy_rate, ⁷inflation_rate
# ℹ Use `print(n = ...)` to see more rows
This data set is not far from being tidy. Once the data set is tidy, there will be four columns: one for the date of the observation, one for the specific rate in question, one to describe what it is about that rate that is being observed, and one to provide the value of the given rate. When I read in the data, I renamed the columns with this tidying in mind.
One inconvenience about this data set is that the year, month, and date of each observation each exist in their own columns, rather than there being one single column for the date. Fortunately, the lubridate
package provides a simple work-around for this with the make_date()
function.
When we pivot_longer()
the data, we are going to take every column but the date (we will select()
out the individual year/month/day columns once we’ve generated our date column) and pivot them into two columns: one for rate_type, and one for value. Then, we will take one further step and use the separate()
function to split rate_type into two separate columns, one for each variable.
# A tibble: 6,328 × 4
date rate type val
<date> <chr> <chr> <dbl>
1 1954-07-01 fedfunds target NA
2 1954-07-01 fedfunds upper NA
3 1954-07-01 fedfunds lower NA
4 1954-07-01 fedfunds eff 0.8
5 1954-07-01 gdp change 4.6
6 1954-07-01 unemploy rate 5.8
7 1954-07-01 inflation rate NA
8 1954-08-01 fedfunds target NA
9 1954-08-01 fedfunds upper NA
10 1954-08-01 fedfunds lower NA
# … with 6,318 more rows
# ℹ Use `print(n = ...)` to see more rows
---
title: "Challenge 4 Submission"
author: "Nick Boonstra"
description: "More data wrangling: pivoting"
date: "08/21/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
---
## Introduction
Today's challenge introduces the `mutate()` function as an additional tool in
the data-wrangling process. I will be using the "FedFundsRate" data set.
```{r setup}
#| warning: false
#| message: false
library(tidyverse)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
```{r read in}
fed <- read_csv(
file = "_data/FedFundsRate.csv",
skip = 1,
col_names = c("year","month","day","fedfunds_target","fedfunds_upper","fedfunds_lower",
"fedfunds_eff","gdp_change","unemploy_rate","inflation_rate")
)
fed
```
### Briefly describe the data
This data set is not far from being tidy. Once the data set is tidy, there will be four columns: one for the date of the observation, one for the specific rate in question, one to describe what it is about that rate that is being observed, and one to provide the value of the given rate. When I read in the data, I renamed the columns with this tidying in mind.
## Tidying and Wrangling
One inconvenience about this data set is that the year, month, and date of each observation each exist in their own columns, rather than there being one single column for the date. Fortunately, the `lubridate` package provides a simple work-around for this with the `make_date()` function.
When we `pivot_longer()` the data, we are going to take every column but the date (we will `select()` out the individual year/month/day columns once we've generated our date column) and pivot them into two columns: one for rate_type, and one for value. Then, we will take one further step and use the `separate()` function to split rate_type into two separate columns, one for each variable.
```{r wrangling}
fed <- fed %>%
mutate(date=make_date(year,month,day)) %>%
select(-c(year,month,day)) %>%
pivot_longer(
cols=!date,
names_to="rate_type",
values_to="val"
) %>%
separate(col=rate_type,into=c("rate","type"))
fed
```