Challenge 4: Mutating hotel bookings dataset

challenge_4
hotel_bookings
Saksham Kumar
More data wrangling: mutate
Author

Saksham Kumar

Published

April 12, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

We read in the hotel_bookings.csv dataset, using the correct R package and command.

Code
bookings<-read_csv("_data/hotel_bookings.csv", show_col_types = FALSE)
bookings

Briefly describe the data

The data corresponds to hotel bookings. There are 119390 rows and 32 variables in our data. The 32 data types and their data types are mentioned below.

Code
spec(bookings)
cols(
  hotel = col_character(),
  is_canceled = col_double(),
  lead_time = col_double(),
  arrival_date_year = col_double(),
  arrival_date_month = col_character(),
  arrival_date_week_number = col_double(),
  arrival_date_day_of_month = col_double(),
  stays_in_weekend_nights = col_double(),
  stays_in_week_nights = col_double(),
  adults = col_double(),
  children = col_double(),
  babies = col_double(),
  meal = col_character(),
  country = col_character(),
  market_segment = col_character(),
  distribution_channel = col_character(),
  is_repeated_guest = col_double(),
  previous_cancellations = col_double(),
  previous_bookings_not_canceled = col_double(),
  reserved_room_type = col_character(),
  assigned_room_type = col_character(),
  booking_changes = col_double(),
  deposit_type = col_character(),
  agent = col_character(),
  company = col_character(),
  days_in_waiting_list = col_double(),
  customer_type = col_character(),
  adr = col_double(),
  required_car_parking_spaces = col_double(),
  total_of_special_requests = col_double(),
  reservation_status = col_character(),
  reservation_status_date = col_date(format = "")
)

Tidy Data (as needed)

The data looks clean for an initial analysis and does not need to be tidied.

Identify variables that need to be mutated

Looking at the data we see that we have 3 variables that can be coalesced into one - arrival_date_day_of_month, arrival_date_month, arrival_date_year.

Using mutate the first three fields can be combined into a single arrival date field.

Code
bookings_mutate_date_arrival<-bookings%>%
  mutate(date_arrival = str_c(arrival_date_day_of_month, arrival_date_month, arrival_date_year, sep="/"))

bookings_mutate_strToDate<-bookings_mutate_date_arrival%>%
  mutate(date_arrival = dmy(date_arrival))

bookings_mutate_strToDate[ , c("arrival_date_day_of_month", "arrival_date_month", "arrival_date_year", "date_arrival")]  
Code
summary(bookings_mutate_strToDate$date_arrival)
        Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
"2015-07-01" "2016-03-13" "2016-09-06" "2016-08-28" "2017-03-18" "2017-08-31" 

We can now clean the data by removing the now redundant variables arrival_date_day_of_month, arrival_date_month and arrival_date_year

Code
bookings_final <- bookings_mutate_strToDate%>%
  select(-starts_with("arrival"))

bookings_final