DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
  • Briefly describe the data
  • Tidy Data (as needed)
  • Identify variables that need to be mutated

Challenge 4

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
Author

Jack Sniezek

Published

December 5, 2022

Code
library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

  • FedFundsRate.csv⭐⭐⭐
Code
fed_rates_orig <- read_csv("_data/FedFundsRate.csv")
fed_rates_orig
# A tibble: 904 × 10
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`
Code
summary(fed_rates_orig)
      Year          Month             Day         Federal Funds Target Rate
 Min.   :1954   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000           
 1st Qu.:1973   1st Qu.: 4.000   1st Qu.: 1.000   1st Qu.: 3.750           
 Median :1988   Median : 7.000   Median : 1.000   Median : 5.500           
 Mean   :1987   Mean   : 6.598   Mean   : 3.598   Mean   : 5.658           
 3rd Qu.:2001   3rd Qu.:10.000   3rd Qu.: 1.000   3rd Qu.: 7.750           
 Max.   :2017   Max.   :12.000   Max.   :31.000   Max.   :11.500           
                                                  NA's   :442              
 Federal Funds Upper Target Federal Funds Lower Target
 Min.   :0.2500             Min.   :0.0000            
 1st Qu.:0.2500             1st Qu.:0.0000            
 Median :0.2500             Median :0.0000            
 Mean   :0.3083             Mean   :0.0583            
 3rd Qu.:0.2500             3rd Qu.:0.0000            
 Max.   :1.0000             Max.   :0.7500            
 NA's   :801                NA's   :801               
 Effective Federal Funds Rate Real GDP (Percent Change) Unemployment Rate
 Min.   : 0.070               Min.   :-10.000           Min.   : 3.400   
 1st Qu.: 2.428               1st Qu.:  1.400           1st Qu.: 4.900   
 Median : 4.700               Median :  3.100           Median : 5.700   
 Mean   : 4.911               Mean   :  3.138           Mean   : 5.979   
 3rd Qu.: 6.580               3rd Qu.:  4.875           3rd Qu.: 7.000   
 Max.   :19.100               Max.   : 16.500           Max.   :10.800   
 NA's   :152                  NA's   :654               NA's   :152      
 Inflation Rate  
 Min.   : 0.600  
 1st Qu.: 2.000  
 Median : 2.800  
 Mean   : 3.733  
 3rd Qu.: 4.700  
 Max.   :13.600  
 NA's   :194     

Briefly describe the data

The Federal Funds Rate dataset contains columns for year, month, and day, as well as 4 federal funds rate columns, GDP, unemployment rate, and the inflation rates collected from 1954 into 2017. There is a lot of missing data, but I noticed that there was a reason for a lot of it. GDP was collected quarterly, so the same 4 months each year contained GDP data while the rest were empty. The target federal funds rate was replaced by the upper and lower target rates beginning in 2009. Inflation was not collected until 1958. Target federal funds rate wasn’t collected until the end of 1982. Lastly, any date that did not correspond to the first of each month did not have data for the effective federal funds rate, GDP, inflation rate, or unemployment rate.

Tidy Data (as needed)

My plan is to try to filter out the dates that do not correspond to the first of the month, as those dates only have data for the target federal funds rate and nothing else.

Code
fed_rates_clean <- filter(fed_rates_orig, `Day` == 1)
fed_rates_clean
# A tibble: 753 × 10
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 743 more rows, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`

Identify variables that need to be mutated

I will be mutating the date variables into one variable. This will make it easier to visualize in a graph or table. I will also mutate the upper and lower target federal funds rates to fill in the rest of the target federal funds rate column, which I will be able to use instead of having three separate target rates. This should leave me with 6 columns, with one being a date column and the other five being different rates.

Code
fed_rates_new <- fed_rates_clean%>%
  mutate(Date = make_date(Year, Month, Day), .before = `Federal Funds Target Rate`)

fed_rates_new <- fed_rates_new%>%
  mutate(`Federal Funds Target Rate` = ifelse(is.na(`Federal Funds Target Rate`), (`Federal Funds Upper Target`+ `Federal Funds Lower Target`)/2, `Federal Funds Target Rate`))

fed_rates_new <- select(fed_rates_new, -c("Year", "Month", "Day", contains("Upper"), contains("Lower")))
fed_rates_new
# A tibble: 753 × 6
   Date       `Federal Funds Target Rate` Effective Fe…¹ Real …² Unemp…³ Infla…⁴
   <date>                           <dbl>          <dbl>   <dbl>   <dbl>   <dbl>
 1 1954-07-01                          NA           0.8      4.6     5.8      NA
 2 1954-08-01                          NA           1.22    NA       6        NA
 3 1954-09-01                          NA           1.06    NA       6.1      NA
 4 1954-10-01                          NA           0.85     8       5.7      NA
 5 1954-11-01                          NA           0.83    NA       5.3      NA
 6 1954-12-01                          NA           1.28    NA       5        NA
 7 1955-01-01                          NA           1.39    11.9     4.9      NA
 8 1955-02-01                          NA           1.29    NA       4.7      NA
 9 1955-03-01                          NA           1.35    NA       4.6      NA
10 1955-04-01                          NA           1.43     6.7     4.7      NA
# … with 743 more rows, and abbreviated variable names
#   ¹​`Effective Federal Funds Rate`, ²​`Real GDP (Percent Change)`,
#   ³​`Unemployment Rate`, ⁴​`Inflation Rate`

The data matches what I was trying to accomplish.

Source Code
---
title: "Challenge 4"
author: "Jack Sniezek"
desription: "More data wrangling: pivoting"
date: "12/05/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - abc_poll
  - eggs
  - fed_rates
  - hotel_bookings
  - debt
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

-   FedFundsRate.csv⭐⭐⭐

```{r}
fed_rates_orig <- read_csv("_data/FedFundsRate.csv")
fed_rates_orig
summary(fed_rates_orig)

```

## Briefly describe the data

The Federal Funds Rate dataset contains columns for year, month, and day, as well as 4 federal funds rate columns, GDP, unemployment rate, and the inflation rates collected from 1954 into 2017. There is a lot of missing data, but I noticed that there was a reason for a lot of it. GDP was collected quarterly, so the same 4 months each year contained GDP data while the rest were empty. The target federal funds rate was replaced by the upper and lower target rates beginning in 2009. Inflation was not collected until 1958. Target federal funds rate wasn't collected until the end of 1982. Lastly, any date that did not correspond to the first of each month did not have data for the effective federal funds rate, GDP, inflation rate, or unemployment rate.

## Tidy Data (as needed)

My plan is to try to filter out the dates that do not correspond to the first of the month, as those dates only have data for the target federal funds rate and nothing else.

```{r}
fed_rates_clean <- filter(fed_rates_orig, `Day` == 1)
fed_rates_clean
```



## Identify variables that need to be mutated

I will be mutating the date variables into one variable. This will make it easier to visualize in a graph or table. I will also mutate the upper and lower target federal funds rates to fill in the rest of the target federal funds rate column, which I will be able to use instead of having three separate target rates. This should leave me with 6 columns, with one being a date column and the other five being different rates.

```{r}
fed_rates_new <- fed_rates_clean%>%
  mutate(Date = make_date(Year, Month, Day), .before = `Federal Funds Target Rate`)

fed_rates_new <- fed_rates_new%>%
  mutate(`Federal Funds Target Rate` = ifelse(is.na(`Federal Funds Target Rate`), (`Federal Funds Upper Target`+ `Federal Funds Lower Target`)/2, `Federal Funds Target Rate`))

fed_rates_new <- select(fed_rates_new, -c("Year", "Month", "Day", contains("Upper"), contains("Lower")))
fed_rates_new

```

The data matches what I was trying to accomplish.