Challenge 4 Instructions

challenge_4

fed_rates

hotel_bookings

darron_bunt

Author

Darron Bunt

Published

October 16, 2022

Code

library(tidyverse)
library(lubridate)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
tidy data (as needed, including sanity checks)
identify variables that need to be mutated
mutate variables and sanity check all mutations

Read in one (or more) of the following datasets, using the correct R package and command.

FedFundsRate.csv⭐⭐⭐
hotel_bookings.csv⭐⭐⭐⭐

Code

FedFundsRate <- read_csv("_data/FedFundsRate.csv")

This dataset is 904 rows long and has 10 columns. It examines historical Federal Funds data across 67 years (broken up by a YYYY-MM-DD variable). The Federal Funds Rate is the target interest rate that’s set by the Federal Open Market Committee (FOMC) and is the target rate at which commercial banks lend their excess reserves to each other overnight.

The date-specific information in the dataset is broken down across seven different variables. Four are related to the Federal Funds Rate (the target rate, upper and lower target rates, and the effective rate), and three are related to economic indicators (% Change in Real GDP, the Unemployment Rate, and the Inflation Rate).

Sadly, the data is not currently tidy. The date data, currently in three columns, can be combined into one YYYY-MM-DD column.

Code

FedFundsRate2 <- FedFundsRate %>%
  mutate(FullDate = make_date(Year, Month, Day)
         )
FedFundsRate2

# A tibble: 904 × 11
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, 1 more variable: FullDate <date>, and abbreviated
#   variable names ¹`Federal Funds Target Rate`, ²`Federal Funds Upper Target`,
#   ³`Federal Funds Lower Target`, ⁴`Effective Federal Funds Rate`,
#   ⁵`Real GDP (Percent Change)`, ⁶`Unemployment Rate`, ⁷`Inflation Rate`

Yay, better!

Read in one (or more) of the following datasets, using the correct R package and command.

- hotel_bookings.csv⭐⭐⭐⭐

Code

Hotels <- read_csv("_data/hotel_bookings.csv")

So first off, I’m going to combine the the arrival year, month and day into one single value.

Code

Hotels2 <- Hotels %>%
  mutate(FullArrDate = str_c(arrival_date_month,
                          arrival_date_day_of_month,
                          arrival_date_year, sep = "/"),
         FullArrivalDate = mdy(FullArrDate)
  )
Hotels2

# A tibble: 119,390 × 34
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 24 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …

I also just want to know how many people stayed in the hotel room, total. So I’m going to combine adults, children and babies into one.

Code

Hotels3 <- Hotels2 %>%
  mutate(TotalGuests = adults + children + babies)
Hotels3

# A tibble: 119,390 × 35
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 25 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …