DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4 Instructions

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Datasets Used

Challenge 4 Instructions

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
fed_rates
hotel_bookings
darron_bunt
Author

Darron Bunt

Published

October 16, 2022

Code
library(tidyverse)
library(lubridate)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in one (or more) of the following datasets, using the correct R package and command.

  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐

Datasets Used

  • Fed Funds
  • Hotel Bookings
Code
FedFundsRate <- read_csv("_data/FedFundsRate.csv")

This dataset is 904 rows long and has 10 columns. It examines historical Federal Funds data across 67 years (broken up by a YYYY-MM-DD variable). The Federal Funds Rate is the target interest rate that’s set by the Federal Open Market Committee (FOMC) and is the target rate at which commercial banks lend their excess reserves to each other overnight.

The date-specific information in the dataset is broken down across seven different variables. Four are related to the Federal Funds Rate (the target rate, upper and lower target rates, and the effective rate), and three are related to economic indicators (% Change in Real GDP, the Unemployment Rate, and the Inflation Rate).

Sadly, the data is not currently tidy. The date data, currently in three columns, can be combined into one YYYY-MM-DD column.

Code
FedFundsRate2 <- FedFundsRate %>%
  mutate(FullDate = make_date(Year, Month, Day)
         )
FedFundsRate2
# A tibble: 904 × 11
    Year Month   Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
   <dbl> <dbl> <dbl>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  1954     7     1          NA      NA      NA    0.8      4.6     5.8      NA
 2  1954     8     1          NA      NA      NA    1.22    NA       6        NA
 3  1954     9     1          NA      NA      NA    1.06    NA       6.1      NA
 4  1954    10     1          NA      NA      NA    0.85     8       5.7      NA
 5  1954    11     1          NA      NA      NA    0.83    NA       5.3      NA
 6  1954    12     1          NA      NA      NA    1.28    NA       5        NA
 7  1955     1     1          NA      NA      NA    1.39    11.9     4.9      NA
 8  1955     2     1          NA      NA      NA    1.29    NA       4.7      NA
 9  1955     3     1          NA      NA      NA    1.35    NA       4.6      NA
10  1955     4     1          NA      NA      NA    1.43     6.7     4.7      NA
# … with 894 more rows, 1 more variable: FullDate <date>, and abbreviated
#   variable names ¹​`Federal Funds Target Rate`, ²​`Federal Funds Upper Target`,
#   ³​`Federal Funds Lower Target`, ⁴​`Effective Federal Funds Rate`,
#   ⁵​`Real GDP (Percent Change)`, ⁶​`Unemployment Rate`, ⁷​`Inflation Rate`

Yay, better!

Read in one (or more) of the following datasets, using the correct R package and command.

    • hotel_bookings.csv⭐⭐⭐⭐
Code
Hotels <- read_csv("_data/hotel_bookings.csv")

So first off, I’m going to combine the the arrival year, month and day into one single value.

Code
Hotels2 <- Hotels %>%
  mutate(FullArrDate = str_c(arrival_date_month,
                          arrival_date_day_of_month,
                          arrival_date_year, sep = "/"),
         FullArrivalDate = mdy(FullArrDate)
  )
Hotels2
# A tibble: 119,390 × 34
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 24 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …

I also just want to know how many people stayed in the hotel room, total. So I’m going to combine adults, children and babies into one.

Code
Hotels3 <- Hotels2 %>%
  mutate(TotalGuests = adults + children + babies)
Hotels3
# A tibble: 119,390 × 35
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 25 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …
Source Code
---
title: "Challenge 4 Instructions"
author: "Darron Bunt"
desription: "More data wrangling: pivoting"
date: "10/16/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - fed_rates
  - hotel_bookings
  - darron_bunt
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(lubridate)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

Read in one (or more) of the following datasets, using the correct R package and command.

-   FedFundsRate.csv⭐⭐⭐
-   hotel_bookings.csv⭐⭐⭐⭐

## Datasets Used 
::: panel-tabset

### Fed Funds 


```{r}
FedFundsRate <- read_csv("_data/FedFundsRate.csv")

```

This dataset is 904 rows long and has 10 columns. It examines historical Federal Funds data across 67 years (broken up by a YYYY-MM-DD variable). The Federal Funds Rate is the target interest rate that's set by the Federal Open Market Committee (FOMC) and is the target rate at which commercial banks lend their excess reserves to each other overnight.

The date-specific information in the dataset is broken down across seven different variables. Four are related to the Federal Funds Rate (the target rate, upper and lower target rates, and the effective rate), and three are related to economic indicators (% Change in Real GDP, the Unemployment Rate, and the Inflation Rate).

Sadly, the data is not currently tidy. The date data, currently in three columns, can be combined into one YYYY-MM-DD column. 

```{r}
FedFundsRate2 <- FedFundsRate %>%
  mutate(FullDate = make_date(Year, Month, Day)
         )
FedFundsRate2

```

Yay, better!

### Hotel Bookings 

Read in one (or more) of the following datasets, using the correct R package and command.

-   -   hotel_bookings.csv⭐⭐⭐⭐

```{r}
Hotels <- read_csv("_data/hotel_bookings.csv")
```

So first off, I'm going to combine the the arrival year, month and day into one single value. 

```{r}
Hotels2 <- Hotels %>%
  mutate(FullArrDate = str_c(arrival_date_month,
                          arrival_date_day_of_month,
                          arrival_date_year, sep = "/"),
         FullArrivalDate = mdy(FullArrDate)
  )
Hotels2
```
I also just want to know how many people stayed in the hotel room, total. So I'm going to combine adults, children and babies into one.

```{r}
Hotels3 <- Hotels2 %>%
  mutate(TotalGuests = adults + children + babies)
Hotels3
```
:::