Challenge 4 Instructions

challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
Author

Khadijat Adeleye

Published

March 29, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
  df <- read.csv("_data/FedFundsRate.csv")
view(df)

####DATASET DESCRIPTION The federal funds dataset shows the trend of united states economy related to federal funds. The economy was measured by the unemployment rate, inflation rate and GDP. The data set consist of 10 column and 904rows. There are 3196 null values in entire data frame. There are seven columns with more than one null Values.

Code
# num of rows of dataset
nrow(df)
[1] 904
Code
#num of cols of dataset
ncol(df)
[1] 10
Code
#name of all the columns
colnames(df)
 [1] "Year"                         "Month"                       
 [3] "Day"                          "Federal.Funds.Target.Rate"   
 [5] "Federal.Funds.Upper.Target"   "Federal.Funds.Lower.Target"  
 [7] "Effective.Federal.Funds.Rate" "Real.GDP..Percent.Change."   
 [9] "Unemployment.Rate"            "Inflation.Rate"              
Code
   tail(df)
    Year Month Day Federal.Funds.Target.Rate Federal.Funds.Upper.Target
899 2016    12   1                        NA                       0.50
900 2016    12  14                        NA                       0.75
901 2017     1   1                        NA                       0.75
902 2017     2   1                        NA                       0.75
903 2017     3   1                        NA                       0.75
904 2017     3  16                        NA                       1.00
    Federal.Funds.Lower.Target Effective.Federal.Funds.Rate
899                       0.25                         0.54
900                       0.50                           NA
901                       0.50                         0.65
902                       0.50                         0.66
903                       0.50                           NA
904                       0.75                           NA
    Real.GDP..Percent.Change. Unemployment.Rate Inflation.Rate
899                        NA               4.7            2.2
900                        NA                NA             NA
901                        NA               4.8            2.3
902                        NA               4.7            2.2
903                        NA                NA             NA
904                        NA                NA             NA
Code
###count for number of null values
sum(is.na(df))
[1] 3196

#####TIDY DATASET when tidying up this data, we first want to check if there are any missing entries in the data set.

Code
#count number of missing entries 
num_missing_cols<-colSums(is.na(df))
print(num_missing_cols)
                        Year                        Month 
                           0                            0 
                         Day    Federal.Funds.Target.Rate 
                           0                          442 
  Federal.Funds.Upper.Target   Federal.Funds.Lower.Target 
                         801                          801 
Effective.Federal.Funds.Rate    Real.GDP..Percent.Change. 
                         152                          654 
           Unemployment.Rate               Inflation.Rate 
                         152                          194 

The chosen format was pivoted longer and the unspecified data values(N/A values) in the dataset. The columns and rows were remove from dataframe

Code
#remove missing values
#pivot longer of the federal fund rate columns, removed NA values
clean_FedFundsRate <-pivot_longer(df, col = c("Federal.Funds.Target.Rate" , "Federal.Funds.Lower.Target" , "Effective.Federal.Funds.Rate"),
                 names_to="Federal Fund Type",
                 values_to = "Federal Fund Rate",
                 values_drop_na = TRUE)
Code
#pivot longer of the federal fund rate columns, removed NA values. 

clean_FedFundsRate
# A tibble: 1,317 × 9
    Year Month   Day Federal.Funds.Upp…¹ Real.…² Unemp…³ Infla…⁴ Feder…⁵ Feder…⁶
   <int> <int> <int>               <dbl>   <dbl>   <dbl>   <dbl> <chr>     <dbl>
 1  1954     7     1                  NA     4.6     5.8      NA Effect…    0.8 
 2  1954     8     1                  NA    NA       6        NA Effect…    1.22
 3  1954     9     1                  NA    NA       6.1      NA Effect…    1.06
 4  1954    10     1                  NA     8       5.7      NA Effect…    0.85
 5  1954    11     1                  NA    NA       5.3      NA Effect…    0.83
 6  1954    12     1                  NA    NA       5        NA Effect…    1.28
 7  1955     1     1                  NA    11.9     4.9      NA Effect…    1.39
 8  1955     2     1                  NA    NA       4.7      NA Effect…    1.29
 9  1955     3     1                  NA    NA       4.6      NA Effect…    1.35
10  1955     4     1                  NA     6.7     4.7      NA Effect…    1.43
# … with 1,307 more rows, and abbreviated variable names
#   ¹​Federal.Funds.Upper.Target, ²​Real.GDP..Percent.Change.,
#   ³​Unemployment.Rate, ⁴​Inflation.Rate, ⁵​`Federal Fund Type`,
#   ⁶​`Federal Fund Rate`
Code
clean_FedFundsRate <-p<-pivot_longer(df, col = c("Real.GDP..Percent.Change.", "Unemployment.Rate", "Inflation.Rate"),
                 names_to="GDP Condition",
                 values_to = "GDP Rate",
                 values_drop_na = TRUE)

clean_FedFundsRate
# A tibble: 1,712 × 9
    Year Month   Day Federal.Funds.Tar…¹ Feder…² Feder…³ Effec…⁴ GDP C…⁵ GDP R…⁶
   <int> <int> <int>               <dbl>   <dbl>   <dbl>   <dbl> <chr>     <dbl>
 1  1954     7     1                  NA      NA      NA    0.8  Real.G…     4.6
 2  1954     7     1                  NA      NA      NA    0.8  Unempl…     5.8
 3  1954     8     1                  NA      NA      NA    1.22 Unempl…     6  
 4  1954     9     1                  NA      NA      NA    1.06 Unempl…     6.1
 5  1954    10     1                  NA      NA      NA    0.85 Real.G…     8  
 6  1954    10     1                  NA      NA      NA    0.85 Unempl…     5.7
 7  1954    11     1                  NA      NA      NA    0.83 Unempl…     5.3
 8  1954    12     1                  NA      NA      NA    1.28 Unempl…     5  
 9  1955     1     1                  NA      NA      NA    1.39 Real.G…    11.9
10  1955     1     1                  NA      NA      NA    1.39 Unempl…     4.9
# … with 1,702 more rows, and abbreviated variable names
#   ¹​Federal.Funds.Target.Rate, ²​Federal.Funds.Upper.Target,
#   ³​Federal.Funds.Lower.Target, ⁴​Effective.Federal.Funds.Rate,
#   ⁵​`GDP Condition`, ⁶​`GDP Rate`

####Mutating Data Also merge date with month, day and year into one easily readable “Date” column.

Code
###margin month and date  variable names to make easier to reference in code
clean_FedFundsRate <- mutate(clean_FedFundsRate, Date = make_date(Year, Month, Day))
Code
clean_FedFundsRate<-clean_FedFundsRate[complete.cases(clean_FedFundsRate$"Inflation.Rate"),]
Error in complete.cases(clean_FedFundsRate$Inflation.Rate): no input has determined the number of cases
Code
select(clean_FedFundsRate,Date,"Inflation.Rate","Unemployment.Rate","Federal Fund Rate","Real.GDP..Percent.Change.","Federal Fund Type")
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Inflation.Rate` doesn't exist.