DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Tidy Data
  • Identify variables that need to be mutated
    • Additional Comments

Challenge 4

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
Author

Matthew O’Neill

Published

August 18, 2022

Code
library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Code
data = read_csv("../posts/_data/FedFundsRate.csv")
data
# A tibble: 904 × 10
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`

Briefly describe the data

The data describes monthly metrics related to the US economy from 1954 to 2017. The data includes the effective Federal interest rate, the change in GDP from the previous quarter, the unemployment rate for that month, and the year over year inflation rate.

Tidy Data

Code
sum(is.na(data$"Federal Funds Target Rate"))
[1] 442

There are 442 rows out of 904 which contain N/A values for the Target Rate. This is because the target rate was not a piece of data that was recorded until 1982 and, in 2012, the Target Rate was replaced with a target upper and lower bound for the federal funds rate to fall between.

If we were interested in working with the target rate, we could find the midpoint of the lower and upper bound of the target rate and treat that as an estimated target rate.

For now though, the data is tidy enough to work with and there are no unnecessary rows or columns to completely get rid of.

Identify variables that need to be mutated

The dates do however need to be mutated, as we currently have columns for month, day, and year. This is redudant, and we can reformat using the str_c() function. We do need to specify day as now all rows fall on the first day of the month.

Code
data<-data%>%
  mutate(date = str_c(`Year`,`Month`, `Day`, sep="/"),date = ymd(date))

data <- data[-c(1,2,3)]
data <- select(data, date, everything())
data
# A tibble: 904 × 8
   date       Federal Funds Ta…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <date>                  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 1954-07-01                 NA      NA      NA    0.8      4.6     5.8      NA
 2 1954-08-01                 NA      NA      NA    1.22    NA       6        NA
 3 1954-09-01                 NA      NA      NA    1.06    NA       6.1      NA
 4 1954-10-01                 NA      NA      NA    0.85     8       5.7      NA
 5 1954-11-01                 NA      NA      NA    0.83    NA       5.3      NA
 6 1954-12-01                 NA      NA      NA    1.28    NA       5        NA
 7 1955-01-01                 NA      NA      NA    1.39    11.9     4.9      NA
 8 1955-02-01                 NA      NA      NA    1.29    NA       4.7      NA
 9 1955-03-01                 NA      NA      NA    1.35    NA       4.6      NA
10 1955-04-01                 NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`

We mutated the dateset to include a date column, moved that date coulmn to the front, and removed the now redundant year, month, and day columns.

Additional Comments

The rest of the data is formatted well and there doesn’t appear to be any additional redudancies we can get rid of. If we were working with GDP we might consider reducing the data to be quarterly, but all other metrics are recorded on a monthly basis, so it doesn’t make sense to do it preemptively.

Source Code
---
title: "Challenge 4"
author: "Matthew O'Neill"
desription: "More data wrangling: pivoting"
date: "08/18/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - abc_poll
  - eggs
  - fed_rates
  - hotel_bookings
  - debt
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

```{r}
data = read_csv("../posts/_data/FedFundsRate.csv")
data

```

### Briefly describe the data

The data describes monthly metrics related to the US economy from 1954 to 2017. The data includes the effective Federal interest rate, the change in GDP from the previous quarter, the unemployment rate for that month, and the year over year inflation rate.

## Tidy Data

```{r}
sum(is.na(data$"Federal Funds Target Rate"))


```
There are 442 rows out of 904 which contain N/A values for the Target Rate. This is because the target rate was not a piece of data that was recorded until 1982 and, in 2012, the Target Rate was replaced with a target upper and lower bound for the federal funds rate to fall between. 

If we were interested in working with the target rate, we could find the midpoint of the lower and upper bound of the target rate and treat that as an estimated target rate. 

For now though, the data is tidy enough to work with and there are no unnecessary rows or columns to completely get rid of.

## Identify variables that need to be mutated

The dates do however need to be mutated, as we currently have columns for month, day, and year. This is redudant, and we can reformat using the str_c() function. We do need to specify day as now all rows fall on the first day of the month.


```{r}
data<-data%>%
  mutate(date = str_c(`Year`,`Month`, `Day`, sep="/"),date = ymd(date))

data <- data[-c(1,2,3)]
data <- select(data, date, everything())
data

```
We mutated the dateset to include a date column, moved that date coulmn to the front, and removed the now redundant year, month, and day columns. 

### Additional Comments

The rest of the data is formatted well and there doesn't appear to be any additional redudancies we can get rid of. If we were working with GDP we might consider reducing the data to be quarterly, but all other metrics are recorded on a monthly basis, so it doesn't make sense to do it preemptively.