Challenge 7

challenge_7
hotel_bookings
australian_marriage
air_bnb
eggs
abc_poll
faostat
us_hh
Visualizing Multiple Dimensions
Author

Sai Venkatesh

Published

April 28, 2023

library(tidyverse)
library(ggplot2)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. mutate variables as needed (including sanity checks)
  4. Recreate at least two graphs from previous exercises, but introduce at least one additional dimension that you omitted before using ggplot functionality (color, shape, line, facet, etc) The goal is not to create unneeded chart ink (Tufte), but to concisely capture variation in additional dimensions that were collapsed in your earlier 2 or 3 dimensional graphs.
  • Explain why you choose the specific graph type
  1. If you haven’t tried in previous weeks, work this week to make your graphs “publication” ready with titles, captions, and pretty axis labels and other viewer-friendly features

R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code. And anyone not familiar with Edward Tufte should check out his fantastic books and courses on data visualizaton.

(be sure to only include the category tags for the data you use!)

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • eggs ⭐
  • abc_poll ⭐⭐
  • australian_marriage ⭐⭐
  • hotel_bookings ⭐⭐⭐
  • air_bnb ⭐⭐⭐
  • us_hh ⭐⭐⭐⭐
  • faostat ⭐⭐⭐⭐⭐

We will read the hotel bookings data.

  hotel_bookings <- read.csv("_data/hotel_bookings.csv")
  
# The Dimensions 
  dim(hotel_bookings)
[1] 119390     32
  # The Column Names 
  colnames(hotel_bookings)
 [1] "hotel"                          "is_canceled"                   
 [3] "lead_time"                      "arrival_date_year"             
 [5] "arrival_date_month"             "arrival_date_week_number"      
 [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
 [9] "stays_in_week_nights"           "adults"                        
[11] "children"                       "babies"                        
[13] "meal"                           "country"                       
[15] "market_segment"                 "distribution_channel"          
[17] "is_repeated_guest"              "previous_cancellations"        
[19] "previous_bookings_not_canceled" "reserved_room_type"            
[21] "assigned_room_type"             "booking_changes"               
[23] "deposit_type"                   "agent"                         
[25] "company"                        "days_in_waiting_list"          
[27] "customer_type"                  "adr"                           
[29] "required_car_parking_spaces"    "total_of_special_requests"     
[31] "reservation_status"             "reservation_status_date"       
  head(hotel_bookings)
         hotel is_canceled lead_time arrival_date_year arrival_date_month
1 Resort Hotel           0       342              2015               July
2 Resort Hotel           0       737              2015               July
3 Resort Hotel           0         7              2015               July
4 Resort Hotel           0        13              2015               July
5 Resort Hotel           0        14              2015               July
6 Resort Hotel           0        14              2015               July
  arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
1                       27                         1                       0
2                       27                         1                       0
3                       27                         1                       0
4                       27                         1                       0
5                       27                         1                       0
6                       27                         1                       0
  stays_in_week_nights adults children babies meal country market_segment
1                    0      2        0      0   BB     PRT         Direct
2                    0      2        0      0   BB     PRT         Direct
3                    1      1        0      0   BB     GBR         Direct
4                    1      1        0      0   BB     GBR      Corporate
5                    2      2        0      0   BB     GBR      Online TA
6                    2      2        0      0   BB     GBR      Online TA
  distribution_channel is_repeated_guest previous_cancellations
1               Direct                 0                      0
2               Direct                 0                      0
3               Direct                 0                      0
4            Corporate                 0                      0
5                TA/TO                 0                      0
6                TA/TO                 0                      0
  previous_bookings_not_canceled reserved_room_type assigned_room_type
1                              0                  C                  C
2                              0                  C                  C
3                              0                  A                  C
4                              0                  A                  A
5                              0                  A                  A
6                              0                  A                  A
  booking_changes deposit_type agent company days_in_waiting_list customer_type
1               3   No Deposit  NULL    NULL                    0     Transient
2               4   No Deposit  NULL    NULL                    0     Transient
3               0   No Deposit  NULL    NULL                    0     Transient
4               0   No Deposit   304    NULL                    0     Transient
5               0   No Deposit   240    NULL                    0     Transient
6               0   No Deposit   240    NULL                    0     Transient
  adr required_car_parking_spaces total_of_special_requests reservation_status
1   0                           0                         0          Check-Out
2   0                           0                         0          Check-Out
3  75                           0                         0          Check-Out
4  75                           0                         0          Check-Out
5  98                           0                         1          Check-Out
6  98                           0                         1          Check-Out
  reservation_status_date
1              2015-07-01
2              2015-07-01
3              2015-07-02
4              2015-07-02
5              2015-07-03
6              2015-07-03
  tail(hotel_bookings)
            hotel is_canceled lead_time arrival_date_year arrival_date_month
119385 City Hotel           0        21              2017             August
119386 City Hotel           0        23              2017             August
119387 City Hotel           0       102              2017             August
119388 City Hotel           0        34              2017             August
119389 City Hotel           0       109              2017             August
119390 City Hotel           0       205              2017             August
       arrival_date_week_number arrival_date_day_of_month
119385                       35                        30
119386                       35                        30
119387                       35                        31
119388                       35                        31
119389                       35                        31
119390                       35                        29
       stays_in_weekend_nights stays_in_week_nights adults children babies meal
119385                       2                    5      2        0      0   BB
119386                       2                    5      2        0      0   BB
119387                       2                    5      3        0      0   BB
119388                       2                    5      2        0      0   BB
119389                       2                    5      2        0      0   BB
119390                       2                    7      2        0      0   HB
       country market_segment distribution_channel is_repeated_guest
119385     BEL  Offline TA/TO                TA/TO                 0
119386     BEL  Offline TA/TO                TA/TO                 0
119387     FRA      Online TA                TA/TO                 0
119388     DEU      Online TA                TA/TO                 0
119389     GBR      Online TA                TA/TO                 0
119390     DEU      Online TA                TA/TO                 0
       previous_cancellations previous_bookings_not_canceled reserved_room_type
119385                      0                              0                  A
119386                      0                              0                  A
119387                      0                              0                  E
119388                      0                              0                  D
119389                      0                              0                  A
119390                      0                              0                  A
       assigned_room_type booking_changes deposit_type agent company
119385                  A               0   No Deposit   394    NULL
119386                  A               0   No Deposit   394    NULL
119387                  E               0   No Deposit     9    NULL
119388                  D               0   No Deposit     9    NULL
119389                  A               0   No Deposit    89    NULL
119390                  A               0   No Deposit     9    NULL
       days_in_waiting_list customer_type    adr required_car_parking_spaces
119385                    0     Transient  96.14                           0
119386                    0     Transient  96.14                           0
119387                    0     Transient 225.43                           0
119388                    0     Transient 157.71                           0
119389                    0     Transient 104.40                           0
119390                    0     Transient 151.20                           0
       total_of_special_requests reservation_status reservation_status_date
119385                         2          Check-Out              2017-09-06
119386                         0          Check-Out              2017-09-06
119387                         2          Check-Out              2017-09-07
119388                         4          Check-Out              2017-09-07
119389                         0          Check-Out              2017-09-07
119390                         2          Check-Out              2017-09-07
  summary(hotel_bookings)
    hotel            is_canceled       lead_time   arrival_date_year
 Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
 Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
 Mode  :character   Median :0.0000   Median : 69   Median :2016     
                    Mean   :0.3704   Mean   :104   Mean   :2016     
                    3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                    Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                    
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 Length:119390      Min.   : 1.00            Min.   : 1.0             
 Class :character   1st Qu.:16.00            1st Qu.: 8.0             
 Mode  :character   Median :28.00            Median :16.0             
                    Mean   :27.17            Mean   :15.8             
                    3rd Qu.:38.00            3rd Qu.:23.0             
                    Max.   :53.00            Max.   :31.0             
                                                                      
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies              meal             country         
 Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
 1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
 Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
 Mean   : 0.1039   Mean   : 0.007949                                        
 3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
 Max.   :10.0000   Max.   :10.000000                                        
 NA's   :4                                                                  
 market_segment     distribution_channel is_repeated_guest
 Length:119390      Length:119390        Min.   :0.00000  
 Class :character   Class :character     1st Qu.:0.00000  
 Mode  :character   Mode  :character     Median :0.00000  
                                         Mean   :0.03191  
                                         3rd Qu.:0.00000  
                                         Max.   :1.00000  
                                                          
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                Length:119390     
 1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
 Median : 0.00000       Median : 0.0000                Mode  :character  
 Mean   : 0.08712       Mean   : 0.1371                                  
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
 Max.   :26.00000       Max.   :72.0000                                  
                                                                         
 assigned_room_type booking_changes   deposit_type          agent          
 Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
 Class :character   1st Qu.: 0.0000   Class :character   Class :character  
 Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
                    Mean   : 0.2211                                        
                    3rd Qu.: 0.0000                                        
                    Max.   :21.0000                                        
                                                                           
   company          days_in_waiting_list customer_type           adr         
 Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
 Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
 Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
                    Mean   :  2.321                         Mean   : 101.83  
                    3rd Qu.:  0.000                         3rd Qu.: 126.00  
                    Max.   :391.000                         Max.   :5400.00  
                                                                             
 required_car_parking_spaces total_of_special_requests reservation_status
 Min.   :0.00000             Min.   :0.0000            Length:119390     
 1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
 Median :0.00000             Median :0.0000            Mode  :character  
 Mean   :0.06252             Mean   :0.5714                              
 3rd Qu.:0.00000             3rd Qu.:1.0000                              
 Max.   :8.00000             Max.   :5.0000                              
                                                                         
 reservation_status_date
 Length:119390          
 Class :character       
 Mode  :character       
                        
                        
                        
                        
  unique(hotel_bookings$hotel)
[1] "Resort Hotel" "City Hotel"  
  unique(hotel_bookings$reservation_status)
[1] "Check-Out" "Canceled"  "No-Show"  
  unique(hotel_bookings$country)
  [1] "PRT"  "GBR"  "USA"  "ESP"  "IRL"  "FRA"  "NULL" "ROU"  "NOR"  "OMN" 
 [11] "ARG"  "POL"  "DEU"  "BEL"  "CHE"  "CN"   "GRC"  "ITA"  "NLD"  "DNK" 
 [21] "RUS"  "SWE"  "AUS"  "EST"  "CZE"  "BRA"  "FIN"  "MOZ"  "BWA"  "LUX" 
 [31] "SVN"  "ALB"  "IND"  "CHN"  "MEX"  "MAR"  "UKR"  "SMR"  "LVA"  "PRI" 
 [41] "SRB"  "CHL"  "AUT"  "BLR"  "LTU"  "TUR"  "ZAF"  "AGO"  "ISR"  "CYM" 
 [51] "ZMB"  "CPV"  "ZWE"  "DZA"  "KOR"  "CRI"  "HUN"  "ARE"  "TUN"  "JAM" 
 [61] "HRV"  "HKG"  "IRN"  "GEO"  "AND"  "GIB"  "URY"  "JEY"  "CAF"  "CYP" 
 [71] "COL"  "GGY"  "KWT"  "NGA"  "MDV"  "VEN"  "SVK"  "FJI"  "KAZ"  "PAK" 
 [81] "IDN"  "LBN"  "PHL"  "SEN"  "SYC"  "AZE"  "BHR"  "NZL"  "THA"  "DOM" 
 [91] "MKD"  "MYS"  "ARM"  "JPN"  "LKA"  "CUB"  "CMR"  "BIH"  "MUS"  "COM" 
[101] "SUR"  "UGA"  "BGR"  "CIV"  "JOR"  "SYR"  "SGP"  "BDI"  "SAU"  "VNM" 
[111] "PLW"  "QAT"  "EGY"  "PER"  "MLT"  "MWI"  "ECU"  "MDG"  "ISL"  "UZB" 
[121] "NPL"  "BHS"  "MAC"  "TGO"  "TWN"  "DJI"  "STP"  "KNA"  "ETH"  "IRQ" 
[131] "HND"  "RWA"  "KHM"  "MCO"  "BGD"  "IMN"  "TJK"  "NIC"  "BEN"  "VGB" 
[141] "TZA"  "GAB"  "GHA"  "TMP"  "GLP"  "KEN"  "LIE"  "GNB"  "MNE"  "UMI" 
[151] "MYT"  "FRO"  "MMR"  "PAN"  "BFA"  "LBY"  "MLI"  "NAM"  "BOL"  "PRY" 
[161] "BRB"  "ABW"  "AIA"  "SLV"  "DMA"  "PYF"  "GUY"  "LCA"  "ATA"  "GTM" 
[171] "ASM"  "MRT"  "NCL"  "KIR"  "SDN"  "ATF"  "SLE"  "LAO" 
  max(hotel_bookings$stays_in_week_nights)
[1] 50
  max(hotel_bookings$stays_in_weekend_nights)
[1] 19
  max(hotel_bookings$adults)
[1] 55
  max(hotel_bookings$children, na.rm = TRUE)
[1] 10
  max(hotel_bookings$babies)
[1] 10

Briefly describe the data

This contains the hotel bookings data from 2015 to 2017 from across 178 countries for 2 hotels - “Resort Hotel”, “City Hotel”. There are a total of 32 columns.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

  hotel_bookings<- na.omit(hotel_bookings)

Are there any variables that require mutation to be usable in your analysis stream? For example, do you need to calculate new values in order to graph them? Can string values be represented numerically? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

  hotel_bookings$date <- ymd(paste(hotel_bookings$arrival_date_year, hotel_bookings$arrival_date_month, hotel_bookings$arrival_date_day_of_month,  sep = "-"))

Visualization with Multiple Dimensions

Lets try to visualise the per hotel bookings across the months based on the assigned room type give to them.

hotel_bookings <- hotel_bookings %>%
  mutate(month=floor_date(date,unit="month"),
         across(c(hotel, assigned_room_type),as.factor)) %>%
  count(month, hotel, assigned_room_type,.drop=F) # ensures 0 is counted
ggplot(hotel_bookings, aes(month, n, col=assigned_room_type))+
  geom_line()+
  facet_wrap(vars(hotel))+
  scale_x_date()+
  theme(axis.text.x=element_text(angle=90))

We can notice here that there were a lot of ‘A’ rooms assigned to the customers whereas ‘P’ and ‘L’ are less.