Challenge 4 by Jinxia Niu

challenge_4
hotel_bookings
Jinxia Niu
More data wrangling: pivoting
Author

Jinxia Niu

Published

March 25, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
library(tidyverse)
hotel_bookings<- read.csv("_data/hotel_bookings.csv")
dim(hotel_bookings)
[1] 119390     32
Code
head(hotel_bookings,10)
          hotel is_canceled lead_time arrival_date_year arrival_date_month
1  Resort Hotel           0       342              2015               July
2  Resort Hotel           0       737              2015               July
3  Resort Hotel           0         7              2015               July
4  Resort Hotel           0        13              2015               July
5  Resort Hotel           0        14              2015               July
6  Resort Hotel           0        14              2015               July
7  Resort Hotel           0         0              2015               July
8  Resort Hotel           0         9              2015               July
9  Resort Hotel           1        85              2015               July
10 Resort Hotel           1        75              2015               July
   arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
1                        27                         1                       0
2                        27                         1                       0
3                        27                         1                       0
4                        27                         1                       0
5                        27                         1                       0
6                        27                         1                       0
7                        27                         1                       0
8                        27                         1                       0
9                        27                         1                       0
10                       27                         1                       0
   stays_in_week_nights adults children babies meal country market_segment
1                     0      2        0      0   BB     PRT         Direct
2                     0      2        0      0   BB     PRT         Direct
3                     1      1        0      0   BB     GBR         Direct
4                     1      1        0      0   BB     GBR      Corporate
5                     2      2        0      0   BB     GBR      Online TA
6                     2      2        0      0   BB     GBR      Online TA
7                     2      2        0      0   BB     PRT         Direct
8                     2      2        0      0   FB     PRT         Direct
9                     3      2        0      0   BB     PRT      Online TA
10                    3      2        0      0   HB     PRT  Offline TA/TO
   distribution_channel is_repeated_guest previous_cancellations
1                Direct                 0                      0
2                Direct                 0                      0
3                Direct                 0                      0
4             Corporate                 0                      0
5                 TA/TO                 0                      0
6                 TA/TO                 0                      0
7                Direct                 0                      0
8                Direct                 0                      0
9                 TA/TO                 0                      0
10                TA/TO                 0                      0
   previous_bookings_not_canceled reserved_room_type assigned_room_type
1                               0                  C                  C
2                               0                  C                  C
3                               0                  A                  C
4                               0                  A                  A
5                               0                  A                  A
6                               0                  A                  A
7                               0                  C                  C
8                               0                  C                  C
9                               0                  A                  A
10                              0                  D                  D
   booking_changes deposit_type agent company days_in_waiting_list
1                3   No Deposit  NULL    NULL                    0
2                4   No Deposit  NULL    NULL                    0
3                0   No Deposit  NULL    NULL                    0
4                0   No Deposit   304    NULL                    0
5                0   No Deposit   240    NULL                    0
6                0   No Deposit   240    NULL                    0
7                0   No Deposit  NULL    NULL                    0
8                0   No Deposit   303    NULL                    0
9                0   No Deposit   240    NULL                    0
10               0   No Deposit    15    NULL                    0
   customer_type   adr required_car_parking_spaces total_of_special_requests
1      Transient   0.0                           0                         0
2      Transient   0.0                           0                         0
3      Transient  75.0                           0                         0
4      Transient  75.0                           0                         0
5      Transient  98.0                           0                         1
6      Transient  98.0                           0                         1
7      Transient 107.0                           0                         0
8      Transient 103.0                           0                         1
9      Transient  82.0                           0                         1
10     Transient 105.5                           0                         0
   reservation_status reservation_status_date
1           Check-Out              2015-07-01
2           Check-Out              2015-07-01
3           Check-Out              2015-07-02
4           Check-Out              2015-07-02
5           Check-Out              2015-07-03
6           Check-Out              2015-07-03
7           Check-Out              2015-07-03
8           Check-Out              2015-07-03
9            Canceled              2015-05-06
10           Canceled              2015-04-22

Briefly describe the data

It’s a hotel booking data set of 32 variables,119390 observations.There are several redundant variables, especially the arrival date_year,arrival_date_month and arrival_date_day_of_the_month ones.

Identify variables that need to be mutated

We need to use mutate() and make_date() from the lubridate package to create just one date format. In addition, because the arrival-month is currently shown as characters(eg.July, August) instead of numbers, we also need to use case_when() from the dplyr package to convert them first into numbers.

##Note:The case_when() function from the dplyr package in R can be used to create new variables from existing variables.

This function uses the following basic syntax:

library(dplyr)

df %>% mutate(new_var = case_when(var1 < 15 ~ ‘low’, var2 < 25 ~ ‘med’, TRUE ~ ‘high’)) Note that TRUE is equivalent to an “else” statement.

Code
library(dplyr)
library(lubridate)
hotel_bookings <- hotel_bookings %>%
    mutate(month = case_when(
       arrival_date_month == "January" ~ 1,
       arrival_date_month == "February" ~ 2,
       arrival_date_month == "March" ~ 3,
       arrival_date_month == "April" ~ 4,
       arrival_date_month == "May" ~ 5,
       arrival_date_month == "June" ~ 6,
       arrival_date_month == "July" ~ 7,
       arrival_date_month == "August" ~ 8,
       arrival_date_month == "September" ~ 9,
       arrival_date_month == "October" ~ 10,
       arrival_date_month == "November" ~ 11,
       arrival_date_month == "December" ~ 12,
     )) %>%
    mutate(arrival_date = make_date(arrival_date_year, month, arrival_date_day_of_month)) %>%
    select(-c(arrival_date_year, arrival_date_month, arrival_date_day_of_month, month, arrival_date_week_number)) %>%
    select(hotel, arrival_date, everything())
dim(hotel_bookings)
[1] 119390     29