DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Tidy Data (as needed)
  • Identify variables that need to be mutated

Challenge 4

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
Aleacia Messiah
fed_rates
tidyverse
summarytools
lubridate
Author

Aleacia Messiah

Published

October 5, 2022

Code
library(tidyverse)
library(summarytools)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
# read in FedFundsRate dataset
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
# view FedFundsRate dataset
FedFundsRate
# A tibble: 904 × 10
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, and abbreviated variable names
#   ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`
Code
# view a summary of the dataset
dfSummary(FedFundsRate)
Data Frame Summary  
FedFundsRate  
Dimensions: 904 x 10  
Duplicates: 0  

--------------------------------------------------------------------------------------------------------------------------------
No   Variable                       Stats / Values              Freqs (% of Valid)    Graph                 Valid      Missing  
---- ------------------------------ --------------------------- --------------------- --------------------- ---------- ---------
1    Year                           Mean (sd) : 1986.7 (17.2)   64 distinct values              :           904        0        
     [numeric]                      min < med < max:                                        . : : : : .     (100.0%)   (0.0%)   
                                    1954 < 1987.5 < 2017                              : : : : : : : : : :                       
                                    IQR (CV) : 28 (0)                                 : : : : : : : : : :                       
                                                                                      : : : : : : : : : :                       

2    Month                          Mean (sd) : 6.6 (3.5)       12 distinct values    .                 :   904        0        
     [numeric]                      min < med < max:                                  :                 :   (100.0%)   (0.0%)   
                                    1 < 7 < 12                                        : .   .   . . . . :                       
                                    IQR (CV) : 6 (0.5)                                : : : : : : : : : :                       
                                                                                      : : : : : : : : : :                       

3    Day                            Mean (sd) : 3.6 (6.8)       29 distinct values    :                     904        0        
     [numeric]                      min < med < max:                                  :                     (100.0%)   (0.0%)   
                                    1 < 1 < 31                                        :                                         
                                    IQR (CV) : 0 (1.9)                                :                                         
                                                                                      :                                         

4    Federal Funds Target Rate      Mean (sd) : 5.7 (2.6)       63 distinct values            :             462        442      
     [numeric]                      min < med < max:                                          :             (51.1%)    (48.9%)  
                                    1 < 5.5 < 11.5                                    .       :   .                             
                                    IQR (CV) : 4 (0.5)                                : . . : : : : :                           
                                                                                      : : : : : : : : : .                       

5    Federal Funds Upper Target     Mean (sd) : 0.3 (0.1)       4 distinct values     IIIIIIIIIIIIIIII      103        801      
     [numeric]                      min < med < max:                                  II                    (11.4%)    (88.6%)  
                                    0.2 < 0.2 < 1                                                                               
                                    IQR (CV) : 0 (0.5)                                                                          

6    Federal Funds Lower Target     Mean (sd) : 0.1 (0.1)       4 distinct values     IIIIIIIIIIIIIIII      103        801      
     [numeric]                      min < med < max:                                  II                    (11.4%)    (88.6%)  
                                    0 < 0 < 0.8                                                                                 
                                    IQR (CV) : 0 (2.4)                                                                          

7    Effective Federal Funds Rate   Mean (sd) : 4.9 (3.6)       466 distinct values   .   :                 752        152      
     [numeric]                      min < med < max:                                  : : :                 (83.2%)    (16.8%)  
                                    0.1 < 4.7 < 19.1                                  : : : .                                   
                                    IQR (CV) : 4.2 (0.7)                              : : : : :                                 
                                                                                      : : : : : : . .                           

8    Real GDP (Percent Change)      Mean (sd) : 3.1 (3.6)       113 distinct values       :                 250        654      
     [numeric]                      min < med < max:                                      :                 (27.7%)    (72.3%)  
                                    -10 < 3.1 < 16.5                                      :                                     
                                    IQR (CV) : 3.5 (1.1)                                  : .                                   
                                                                                        : : :                                   

9    Unemployment Rate              Mean (sd) : 6 (1.6)         71 distinct values        :                 752        152      
     [numeric]                      min < med < max:                                      : .               (83.2%)    (16.8%)  
                                    3.4 < 5.7 < 10.8                                    . : :   .                               
                                    IQR (CV) : 2.1 (0.3)                              : : : : : :                               
                                                                                      : : : : : : . . . .                       

10   Inflation Rate                 Mean (sd) : 3.7 (2.6)       106 distinct values     :                   710        194      
     [numeric]                      min < med < max:                                  . :                   (78.5%)    (21.5%)  
                                    0.6 < 2.8 < 13.6                                  : : .                                     
                                    IQR (CV) : 2.7 (0.7)                              : : : .                                   
                                                                                      : : : : :   .   .                         
--------------------------------------------------------------------------------------------------------------------------------

Briefly describe the data

The FedFundsRate data has 904 observations with 10 variables (Year, Month, Day, Federal Funds Target Rate, Federal Funds Upper Target, Federal Funds Lower Target, Effective Federal Funds Rate, Real GDP (Percent Change), Unemployment Rate, and Inflation Rate). Each observations appear to be measuring the effective (current) federal funds rate, real GDP, unemployment rate, and inflation rate for at least the beginning and end of each month for the years 1954 through 2017, with a target, upper target, and/or lower target rate for each observation. It appears that several observations do not have values for some of the variables so those will need to be accounted for during analysis.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Thankfully this data is mostly tidy but we can see that the columns Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target are very similar and don’t need to be in their own separate columns. I will pivot the FedFundsRate dataset to make these grouped into one column called Targets. Currently, we have n=904 observations and k=10 variables so pivoting the data into a longer format would create n=2,712 rows and k=9 columns.

Code
# existing rows/observations
nrow(FedFundsRate)
[1] 904
Code
# existing columns/variables
ncol(FedFundsRate)
[1] 10
Code
# expected rows/observations
nrow(FedFundsRate) * (ncol(FedFundsRate)-7)
[1] 2712
Code
# expected columns/variables
ncol(FedFundsRate) - 1
[1] 9
Code
# pivot longer FedFundsRate
FedFundsRate <- pivot_longer(FedFundsRate, cols = c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target"), names_to = "Targets", values_to = "Value")
# view the pivoted dataset
FedFundsRate
# A tibble: 2,712 × 9
    Year Month   Day Effective Federal F…¹ Real …² Unemp…³ Infla…⁴ Targets Value
   <dbl> <dbl> <dbl>                 <dbl>   <dbl>   <dbl>   <dbl> <chr>   <dbl>
 1  1954     7     1                  0.8      4.6     5.8      NA Federa…    NA
 2  1954     7     1                  0.8      4.6     5.8      NA Federa…    NA
 3  1954     7     1                  0.8      4.6     5.8      NA Federa…    NA
 4  1954     8     1                  1.22    NA       6        NA Federa…    NA
 5  1954     8     1                  1.22    NA       6        NA Federa…    NA
 6  1954     8     1                  1.22    NA       6        NA Federa…    NA
 7  1954     9     1                  1.06    NA       6.1      NA Federa…    NA
 8  1954     9     1                  1.06    NA       6.1      NA Federa…    NA
 9  1954     9     1                  1.06    NA       6.1      NA Federa…    NA
10  1954    10     1                  0.85     8       5.7      NA Federa…    NA
# … with 2,702 more rows, and abbreviated variable names
#   ¹​`Effective Federal Funds Rate`, ²​`Real GDP (Percent Change)`,
#   ³​`Unemployment Rate`, ⁴​`Inflation Rate`

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

We see that the year, month, and day are separated into their own columns so they need to be condensed into one column. Once this column is created, we can remove the Year, Month, and Day columns from the dataset.

Code
# create dates by combining Year, Month, and Day columns and delete those three columns
FedFundsRate <- FedFundsRate %>% 
  mutate(Date = make_date(Year, Month, Day)) %>% 
  select(!contains(c("Year", "Month", "Day")))
# view the dataset
FedFundsRate
# A tibble: 2,712 × 7
   Effective Federal Funds Ra…¹ Real …² Unemp…³ Infla…⁴ Targets Value Date      
                          <dbl>   <dbl>   <dbl>   <dbl> <chr>   <dbl> <date>    
 1                         0.8      4.6     5.8      NA Federa…    NA 1954-07-01
 2                         0.8      4.6     5.8      NA Federa…    NA 1954-07-01
 3                         0.8      4.6     5.8      NA Federa…    NA 1954-07-01
 4                         1.22    NA       6        NA Federa…    NA 1954-08-01
 5                         1.22    NA       6        NA Federa…    NA 1954-08-01
 6                         1.22    NA       6        NA Federa…    NA 1954-08-01
 7                         1.06    NA       6.1      NA Federa…    NA 1954-09-01
 8                         1.06    NA       6.1      NA Federa…    NA 1954-09-01
 9                         1.06    NA       6.1      NA Federa…    NA 1954-09-01
10                         0.85     8       5.7      NA Federa…    NA 1954-10-01
# … with 2,702 more rows, and abbreviated variable names
#   ¹​`Effective Federal Funds Rate`, ²​`Real GDP (Percent Change)`,
#   ³​`Unemployment Rate`, ⁴​`Inflation Rate`

Any additional comments?

Question

Since there are missing values in this dataset for the target rates since in earlier years the Federal Funds Target Rate was used while in later years the Federal Funds Upper and Lower Target Rates were used, would it be easier to take out the NAs for the Federal Funds Target Rate if the Upper and Lower Target Rates were used those years and vice versa? I did not see any observations where the Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target rates all had values.

Source Code
---
title: "Challenge 4"
author: "Aleacia Messiah"
desription: "More data wrangling: pivoting"
date: "10/05/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - Aleacia Messiah
  - fed_rates
  - tidyverse
  - summarytools
  - lubridate
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(summarytools)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

-   abc_poll.csv ⭐
-   poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
-   FedFundsRate.csv⭐⭐⭐
-   hotel_bookings.csv⭐⭐⭐⭐
-   debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

```{r}
# read in FedFundsRate dataset
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
# view FedFundsRate dataset
FedFundsRate
# view a summary of the dataset
dfSummary(FedFundsRate)
```

### Briefly describe the data

The FedFundsRate data has 904 observations with 10 variables (Year, Month, Day, Federal Funds Target Rate, Federal Funds Upper Target, Federal Funds Lower Target, Effective Federal Funds Rate, Real GDP (Percent Change), Unemployment Rate, and Inflation Rate). Each observations appear to be measuring the effective (current) federal funds rate, real GDP, unemployment rate, and inflation rate for at least the beginning and end of each month for the years 1954 through 2017, with a target, upper target, and/or lower target rate for each observation. It appears that several observations do not have values for some of the variables so those will need to be accounted for during analysis.

## Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Thankfully this data is mostly tidy but we can see that the columns Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target are very similar and don't need to be in their own separate columns. I will pivot the FedFundsRate dataset to make these grouped into one column called Targets. Currently, we have $n = 904$ observations and $k = 10$ variables so pivoting the data into a longer format would create $n = 2,712$ rows and $k = 9$ columns.

```{r}
# existing rows/observations
nrow(FedFundsRate)
# existing columns/variables
ncol(FedFundsRate)
# expected rows/observations
nrow(FedFundsRate) * (ncol(FedFundsRate)-7)
# expected columns/variables
ncol(FedFundsRate) - 1
# pivot longer FedFundsRate
FedFundsRate <- pivot_longer(FedFundsRate, cols = c("Federal Funds Target Rate", "Federal Funds Upper Target", "Federal Funds Lower Target"), names_to = "Targets", values_to = "Value")
# view the pivoted dataset
FedFundsRate
```

Any additional comments?

## Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

We see that the year, month, and day are separated into their own columns so they need to be condensed into one column. Once this column is created, we can remove the Year, Month, and Day columns from the dataset.

```{r}
# create dates by combining Year, Month, and Day columns and delete those three columns
FedFundsRate <- FedFundsRate %>% 
  mutate(Date = make_date(Year, Month, Day)) %>% 
  select(!contains(c("Year", "Month", "Day")))
# view the dataset
FedFundsRate
```

Any additional comments?

::: callout-note
## Question

Since there are missing values in this dataset for the target rates since in earlier years the Federal Funds Target Rate was used while in later years the Federal Funds Upper and Lower Target Rates were used, would it be easier to take out the *NA*s for the Federal Funds Target Rate if the Upper and Lower Target Rates were used those years and vice versa? I did not see any observations where the Federal Funds Target Rate, Federal Funds Upper Target, and Federal Funds Lower Target rates all had values.
:::