Code
library(tidyverse)
library(lubridate)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Darron Bunt
October 16, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
This dataset is 904 rows long and has 10 columns. It examines historical Federal Funds data across 67 years (broken up by a YYYY-MM-DD variable). The Federal Funds Rate is the target interest rate that’s set by the Federal Open Market Committee (FOMC) and is the target rate at which commercial banks lend their excess reserves to each other overnight.
The date-specific information in the dataset is broken down across seven different variables. Four are related to the Federal Funds Rate (the target rate, upper and lower target rates, and the effective rate), and three are related to economic indicators (% Change in Real GDP, the Unemployment Rate, and the Inflation Rate).
Sadly, the data is not currently tidy. The date data, currently in three columns, can be combined into one YYYY-MM-DD column.
# A tibble: 904 × 11
Year Month Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
7 1955 1 1 NA NA NA 1.39 11.9 4.9 NA
8 1955 2 1 NA NA NA 1.29 NA 4.7 NA
9 1955 3 1 NA NA NA 1.35 NA 4.6 NA
10 1955 4 1 NA NA NA 1.43 6.7 4.7 NA
# … with 894 more rows, 1 more variable: FullDate <date>, and abbreviated
# variable names ¹`Federal Funds Target Rate`, ²`Federal Funds Upper Target`,
# ³`Federal Funds Lower Target`, ⁴`Effective Federal Funds Rate`,
# ⁵`Real GDP (Percent Change)`, ⁶`Unemployment Rate`, ⁷`Inflation Rate`
Yay, better!
Read in one (or more) of the following datasets, using the correct R package and command.
So first off, I’m going to combine the the arrival year, month and day into one single value.
# A tibble: 119,390 × 34
hotel is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Resor… 0 342 2015 July 27 1 0 0 2
2 Resor… 0 737 2015 July 27 1 0 0 2
3 Resor… 0 7 2015 July 27 1 0 1 1
4 Resor… 0 13 2015 July 27 1 0 1 1
5 Resor… 0 14 2015 July 27 1 0 2 2
6 Resor… 0 14 2015 July 27 1 0 2 2
7 Resor… 0 0 2015 July 27 1 0 2 2
8 Resor… 0 9 2015 July 27 1 0 2 2
9 Resor… 1 85 2015 July 27 1 0 3 2
10 Resor… 1 75 2015 July 27 1 0 3 2
# … with 119,380 more rows, 24 more variables: children <dbl>, babies <dbl>,
# meal <chr>, country <chr>, market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
# deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
# customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …
I also just want to know how many people stayed in the hotel room, total. So I’m going to combine adults, children and babies into one.
# A tibble: 119,390 × 35
hotel is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Resor… 0 342 2015 July 27 1 0 0 2
2 Resor… 0 737 2015 July 27 1 0 0 2
3 Resor… 0 7 2015 July 27 1 0 1 1
4 Resor… 0 13 2015 July 27 1 0 1 1
5 Resor… 0 14 2015 July 27 1 0 2 2
6 Resor… 0 14 2015 July 27 1 0 2 2
7 Resor… 0 0 2015 July 27 1 0 2 2
8 Resor… 0 9 2015 July 27 1 0 2 2
9 Resor… 1 85 2015 July 27 1 0 3 2
10 Resor… 1 75 2015 July 27 1 0 3 2
# … with 119,380 more rows, 25 more variables: children <dbl>, babies <dbl>,
# meal <chr>, country <chr>, market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
# deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
# customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …
---
title: "Challenge 4 Instructions"
author: "Darron Bunt"
desription: "More data wrangling: pivoting"
date: "10/16/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- fed_rates
- hotel_bookings
- darron_bunt
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(lubridate)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
Read in one (or more) of the following datasets, using the correct R package and command.
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
## Datasets Used
::: panel-tabset
### Fed Funds
```{r}
FedFundsRate <- read_csv("_data/FedFundsRate.csv")
```
This dataset is 904 rows long and has 10 columns. It examines historical Federal Funds data across 67 years (broken up by a YYYY-MM-DD variable). The Federal Funds Rate is the target interest rate that's set by the Federal Open Market Committee (FOMC) and is the target rate at which commercial banks lend their excess reserves to each other overnight.
The date-specific information in the dataset is broken down across seven different variables. Four are related to the Federal Funds Rate (the target rate, upper and lower target rates, and the effective rate), and three are related to economic indicators (% Change in Real GDP, the Unemployment Rate, and the Inflation Rate).
Sadly, the data is not currently tidy. The date data, currently in three columns, can be combined into one YYYY-MM-DD column.
```{r}
FedFundsRate2 <- FedFundsRate %>%
mutate(FullDate = make_date(Year, Month, Day)
)
FedFundsRate2
```
Yay, better!
### Hotel Bookings
Read in one (or more) of the following datasets, using the correct R package and command.
- - hotel_bookings.csv⭐⭐⭐⭐
```{r}
Hotels <- read_csv("_data/hotel_bookings.csv")
```
So first off, I'm going to combine the the arrival year, month and day into one single value.
```{r}
Hotels2 <- Hotels %>%
mutate(FullArrDate = str_c(arrival_date_month,
arrival_date_day_of_month,
arrival_date_year, sep = "/"),
FullArrivalDate = mdy(FullArrDate)
)
Hotels2
```
I also just want to know how many people stayed in the hotel room, total. So I'm going to combine adults, children and babies into one.
```{r}
Hotels3 <- Hotels2 %>%
mutate(TotalGuests = adults + children + babies)
Hotels3
```
:::