Code
library(tidyverse)
library(lubridate)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Xinyang Mao
April 3, 2023
Today’s challenge is to:
Year Month Day Federal.Funds.Target.Rate Federal.Funds.Upper.Target
1 1954 7 1 NA NA
2 1954 8 1 NA NA
3 1954 9 1 NA NA
4 1954 10 1 NA NA
5 1954 11 1 NA NA
6 1954 12 1 NA NA
Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
1 NA 0.80
2 NA 1.22
3 NA 1.06
4 NA 0.85
5 NA 0.83
6 NA 1.28
Real.GDP..Percent.Change. Unemployment.Rate Inflation.Rate
1 4.6 5.8 NA
2 NA 6.0 NA
3 NA 6.1 NA
4 8.0 5.7 NA
5 NA 5.3 NA
6 NA 5.0 NA
Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | ||||
---|---|---|---|---|---|---|---|---|
Year [integer] |
|
64 distinct values | 0 (0.0%) | |||||
Month [integer] |
|
12 distinct values | 0 (0.0%) | |||||
Day [integer] |
|
29 distinct values | 0 (0.0%) | |||||
Federal.Funds.Target.Rate [numeric] |
|
63 distinct values | 442 (48.9%) | |||||
Federal.Funds.Upper.Target [numeric] |
|
4 distinct values | 801 (88.6%) | |||||
Federal.Funds.Lower.Target [numeric] |
|
4 distinct values | 801 (88.6%) | |||||
Effective.Federal.Funds.Rate [numeric] |
|
466 distinct values | 152 (16.8%) | |||||
Real.GDP..Percent.Change. [numeric] |
|
113 distinct values | 654 (72.3%) | |||||
Unemployment.Rate [numeric] |
|
71 distinct values | 152 (16.8%) | |||||
Inflation.Rate [numeric] |
|
106 distinct values | 194 (21.5%) |
Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-03
As we can see,there are 3 separate date columns: Year,Month,Day.We can use add a extra column named Date
to combine them as a new column then delete the original 3 separate columns.
Firstly,we use mutate()
and make_date()
functions to create a new column:
Delete previous separate 3 date columns. subset()
used to reserve the useful columns,and store the columns names which need to delete in a character type vector.
[1] "Federal.Funds.Target.Rate" "Federal.Funds.Upper.Target"
[3] "Federal.Funds.Lower.Target" "Effective.Federal.Funds.Rate"
[5] "Real.GDP..Percent.Change." "Unemployment.Rate"
[7] "Inflation.Rate" "Date"
Use the colSums()
function and the is.na() function: this method calculates the number of missing values in each column and returns a vector of numbers where each element represents the number of missing values in the corresponding column
Federal.Funds.Target.Rate Federal.Funds.Upper.Target
TRUE TRUE
Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
TRUE TRUE
Real.GDP..Percent.Change. Unemployment.Rate
TRUE TRUE
Inflation.Rate Date
TRUE FALSE
An element in the colHasNa
vector is TRUE for a missing value in the corresponding column, and FALSE for no missing value in the corresponding column.
According to the check result above,we can find there are lots of missing value and all them are numeric type value. So I use the average value of every column except Date
column to fill the missing cell. Mainly use for
loop and mean()
function to calculate mean and fill,ifelse()
and is.na()
to determine.
Federal.Funds.Target.Rate Federal.Funds.Upper.Target
FALSE FALSE
Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
FALSE FALSE
Real.GDP..Percent.Change. Unemployment.Rate
FALSE FALSE
Inflation.Rate Date
FALSE FALSE
Now we check again,all of the missing values disappered!
---
title: "Challenge_4"
author: "Xinyang Mao"
description: "More data wrangling: pivoting"
date: "04/03/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- fed_rates
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(lubridate)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
- FedFundsRate.csv⭐⭐⭐
```{r}
fedRateData <- read.csv("_data/FedFundsRate.csv")
head(fedRateData)
```
### Briefly describe the data
```{r}
print(summarytools::dfSummary(fedRateData,
varnumbers = FALSE,
plain.ascii = FALSE,
style = "grid",
graph.magnif = 0.70,
valid.col = FALSE),
method = 'render',
table.classes = 'table-condensed')
```
## Modify date
As we can see,there are 3 separate date columns: Year,Month,Day.We can use add a extra column named `Date` to combine them as a new column then delete the original 3 separate columns.
Firstly,we use `mutate()` and `make_date()` functions to create a new column:
```{r}
fedRateRaw <- fedRateData%>%
mutate(Date = make_date(year = `Year`,month = `Month`,day = `Day`))
summary(fedRateRaw$Date)
```
## Delete previous columns
Delete previous separate 3 date columns.
`subset()` used to reserve the useful columns,and store the columns names which need to delete in a character type vector.
```{r}
fedRateRaw <- fedRateRaw %>%
subset(select = -c(`Year`,`Month`,`Day`))
colnames(fedRateRaw)
```
## Sanity check
Use the `colSums()` function and the is.na() function: this method calculates the number of missing values in each column and returns a vector of numbers where each element represents the number of missing values in the corresponding column
```{r}
colHasNa <- colSums(is.na(fedRateRaw)) > 0
colHasNa
```
An element in the `colHasNa` vector is TRUE for a missing value in the corresponding column, and FALSE for no missing value in the corresponding column.
## Use average value to fill in the missing values
According to the check result above,we can find there are lots of missing value and all them are numeric type value.
So I use the average value of every column except `Date` column to fill the missing cell.
Mainly use `for` loop and `mean()` function to calculate mean and fill,`ifelse()` and `is.na()` to determine.
```{r}
colsToFill <- c(1:7)
for (i in colsToFill) {
avg <- mean(fedRateRaw[,i],na.rm = TRUE)
fedRateRaw[,i] <- ifelse(is.na(fedRateRaw[,i]),avg,fedRateRaw[,i])
}
colCheckNa <- colSums(is.na(fedRateRaw)) > 0
colCheckNa
```
Now we check again,all of the missing values disappered!