Challenge4_KatiePopiela

More data wrangling: pivoting
Published

August 18, 2022

Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)

2) tidy data (as needed, including sanity checks)

3) identify variables that need to be mutated

4) mutate variables and sanity check all mutations

Code
library(dplyr)
library(tidyverse)
library(tidyr)
library(ggplot2)
library(readr)

hotel_bookings <- read.csv("_data/hotel_bookings.csv")
Code
#This data set shows hotel booking data between 2015 and 2017. A few of the variables it includes are: # of nights stayed (weekday and weekend), the type of hotel, # of people (adults, children, babies), and whether the reservation was canceled or kept. There are way too many variables in the data set to easily read it, so I'm going to filter it down and try to collapse some columns using pivot)longer().
Code
colnames(hotel_bookings)
 [1] "hotel"                          "is_canceled"                   
 [3] "lead_time"                      "arrival_date_year"             
 [5] "arrival_date_month"             "arrival_date_week_number"      
 [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
 [9] "stays_in_week_nights"           "adults"                        
[11] "children"                       "babies"                        
[13] "meal"                           "country"                       
[15] "market_segment"                 "distribution_channel"          
[17] "is_repeated_guest"              "previous_cancellations"        
[19] "previous_bookings_not_canceled" "reserved_room_type"            
[21] "assigned_room_type"             "booking_changes"               
[23] "deposit_type"                   "agent"                         
[25] "company"                        "days_in_waiting_list"          
[27] "customer_type"                  "adr"                           
[29] "required_car_parking_spaces"    "total_of_special_requests"     
[31] "reservation_status"             "reservation_status_date"       
Code
#Upon first glance, I see that arrival dates are broken up into at least 4 different columns. I'm going to use pivot_longer() to combine them. I'm going to filter out some of the columns as well, focusing on arrival date and how long each stay was (weekday and weekend).
Code
hotel_bookings1 <- hotel_bookings %>%
  select("arrival_date_month","arrival_date_year","stays_in_week_nights","stays_in_weekend_nights","reservation_status") %>%
  filter(`arrival_date_month` == "November") %>%
  arrange(stays_in_week_nights,stays_in_weekend_nights) %>%
  view()
  
#I tried my best to use pivot_longer() and mutate, but I couldn't figure out how to get it to work. Google was no help either.
Code
hotel_bookings1 <- hotel_bookings1 %>%
  filter(`reservation_status` == "Check-Out")

hotel_bookings1 <- hotel_bookings1 %>%
  filter(`stays_in_week_nights` > 3)
Code
ggplot(hotel_bookings1, mapping = aes(x = arrival_date_year, y = stays_in_week_nights)) + geom_jitter()

Code
#I tried my best here and the best geometry function i could find for this was (also geom_jitter). I wanted to compare how many people stayed at their hotel for more than 3 weeknights in 2015 and 2016