Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Ishan Bhardwaj
May 20, 2023
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 904 × 10
Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA
2 1954 8 1 NA NA
3 1954 9 1 NA NA
4 1954 10 1 NA NA
5 1954 11 1 NA NA
6 1954 12 1 NA NA
7 1955 1 1 NA NA
8 1955 2 1 NA NA
9 1955 3 1 NA NA
10 1955 4 1 NA NA
# ℹ 894 more rows
# ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
# `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
# `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
Year
1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
6 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985
12 12 12 12 12 12 12 12 12 12 12 12 16 20 25 21
1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
19 21 24 23 17 21 15 12 18 14 13 13 15 15 15 23
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
13 13 17 19 16 15 19 12 12 12 12 12 12 13 13 4
# A tibble: 12 × 10
Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1955 1 1 NA NA
2 1955 2 1 NA NA
3 1955 3 1 NA NA
4 1955 4 1 NA NA
5 1955 5 1 NA NA
6 1955 6 1 NA NA
7 1955 7 1 NA NA
8 1955 8 1 NA NA
9 1955 9 1 NA NA
10 1955 10 1 NA NA
11 1955 11 1 NA NA
12 1955 12 1 NA NA
# ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
# `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
# `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
# A tibble: 24 × 10
Year Month Day `Federal Funds Target Rate` `Federal Funds Upper Target`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1988 1 1 6.81 NA
2 1988 1 28 6.62 NA
3 1988 2 1 6.62 NA
4 1988 2 11 6.5 NA
5 1988 3 1 6.5 NA
6 1988 3 30 6.75 NA
7 1988 4 1 6.75 NA
8 1988 5 1 6.75 NA
9 1988 5 9 7 NA
10 1988 5 25 7.25 NA
# ℹ 14 more rows
# ℹ 5 more variables: `Federal Funds Lower Target` <dbl>,
# `Effective Federal Funds Rate` <dbl>, `Real GDP (Percent Change)` <dbl>,
# `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
This dataset provides federal funds rate targets with an upper and lower boundary, the actual federal funds rate for a specified year + month + day, and general economic info from 1954 to 2017. As seen from the tables, these values are usually entered on the first day of every month. In certain cases, these values are also entered either halfway through the month or at random dates. These irregular entries may hint at periods of economic fluctuation during those months.
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
[1] 904 10
[1] 3616
[1] 8
This dataset is not tidy because the federal funds target rate, upper and lower targets, and effective values are all observances of one variable: the federal funds rate type. We can compress these four columns into one, whose values are in a new column titled “Rate”. Hence, as a sanity check, I have calculated the expected dimensions of the transformed dataset.
# A tibble: 3,616 × 8
Year Month Day Real GDP (Percent Ch…¹ `Unemployment Rate` `Inflation Rate`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 4.6 5.8 NA
2 1954 7 1 4.6 5.8 NA
3 1954 7 1 4.6 5.8 NA
4 1954 7 1 4.6 5.8 NA
5 1954 8 1 NA 6 NA
6 1954 8 1 NA 6 NA
7 1954 8 1 NA 6 NA
8 1954 8 1 NA 6 NA
9 1954 9 1 NA 6.1 NA
10 1954 9 1 NA 6.1 NA
# ℹ 3,606 more rows
# ℹ abbreviated name: ¹`Real GDP (Percent Change)`
# ℹ 2 more variables: `Federal Funds Rate Type` <chr>, Rate <dbl>
[1] 3616 8
As expected, we end up with a dataset with 3616 rows and 8 columns. Each date will now be observed across four types of federal funds rates.
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
# A tibble: 3,616 × 6
Date `Federal Funds Rate Type` Rate `Real GDP (Percent Change)`
<date> <chr> <dbl> <dbl>
1 1954-07-01 Federal Funds Target Rate NA 4.6
2 1954-07-01 Federal Funds Upper Target NA 4.6
3 1954-07-01 Federal Funds Lower Target NA 4.6
4 1954-07-01 Effective Federal Funds Rate 0.8 4.6
5 1954-08-01 Federal Funds Target Rate NA NA
6 1954-08-01 Federal Funds Upper Target NA NA
7 1954-08-01 Federal Funds Lower Target NA NA
8 1954-08-01 Effective Federal Funds Rate 1.22 NA
9 1954-09-01 Federal Funds Target Rate NA NA
10 1954-09-01 Federal Funds Upper Target NA NA
# ℹ 3,606 more rows
# ℹ 2 more variables: `Unemployment Rate` <dbl>, `Inflation Rate` <dbl>
In the original dataset, the year, month, and day were in separate columns, which is incorrect. These should all be collapsed into one variable “Date”, which is what I have done. I first concatenated each year, month, and day into a string delimited by -
and then used ymd() from lubridate to convert it to proper date format. These new dates were added into a new column “Date” and finally, the relevant columns were selected from the tidy dataset.
---
title: "Challenge 4"
author: "Ishan Bhardwaj"
description: "Data cleaning and mutations"
date: "05/20/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- Ishan Bhardwaj
- fed_rates
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
fed_funds <- read_csv("_data/FedFundsRate.csv")
fed_funds
```
### Briefly describe the data
```{r}
table(select(fed_funds, "Year"))
normal_obsv <- filter(fed_funds, `Year` == 1955)
normal_obsv
diff_obsv <- filter(fed_funds, `Year` == 1988)
diff_obsv
```
This dataset provides federal funds rate targets with an upper and lower boundary, the actual federal funds rate for a specified year + month + day, and general economic info from 1954 to 2017. As seen from the tables, these values are usually entered on the first day of every month. In certain cases, these values are also entered either halfway through the month or at random dates. These irregular entries may hint at periods of economic fluctuation during those months.
## Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
```{r}
dim(fed_funds)
new_rows <- nrow(fed_funds) * 4
new_cols <- ncol(fed_funds) - 4 + 2
# Expected number of rows
new_rows
# Expected number of columns
new_cols
```
This dataset is not tidy because the federal funds target rate, upper and lower targets, and effective values are all observances of one variable: the federal funds rate type. We can compress these four columns into one, whose values are in a new column titled "Rate". Hence, as a sanity check, I have calculated the expected dimensions of the transformed dataset.
```{r}
fed_funds_tidy <- pivot_longer(fed_funds, 4:7, names_to = "Federal Funds Rate Type", values_to = "Rate")
fed_funds_tidy
dim(fed_funds_tidy)
```
As expected, we end up with a dataset with 3616 rows and 8 columns. Each date will now be observed across four types of federal funds rates.
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
```{r}
fed_funds_tidy <- fed_funds_tidy %>%
mutate(Date = ymd(str_c(Year, Month, Day, sep="-"))) %>%
select(Date, 7:8, 4:6)
fed_funds_tidy
```
In the original dataset, the year, month, and day were in separate columns, which is incorrect. These should all be collapsed into one variable "Date", which is what I have done. I first concatenated each year, month, and day into a string delimited by `-` and then used ymd() from lubridate to convert it to proper date format. These new dates were added into a new column "Date" and finally, the relevant columns were selected from the tidy dataset.