Challenge 6

challenge_6
hotel_bookings
air_bnb
fed_rate
debt
usa_hh
abc_poll
Visualizing Time and Relationships
Author

Yoshita Varma Annam

Published

January 9, 2023

library(tidyverse)
library(ggplot2)
library(treemap)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. mutate variables as needed (including sanity checks)
  4. create at least one graph including time (evolution)
  • try to make them “publication” ready (optional)
  • Explain why you choose the specific graph type
  1. Create at least one graph depicting part-whole or flow relationships
  • try to make them “publication” ready (optional)
  • Explain why you choose the specific graph type

R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code.

(be sure to only include the category tags for the data you use!)

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • debt ⭐
  • fed_rate ⭐⭐
  • abc_poll ⭐⭐⭐
  • usa_hh ⭐⭐⭐
  • hotel_bookings ⭐⭐⭐⭐
  • air_bnb ⭐⭐⭐⭐⭐
HotelBookings_csv <- read_csv("_data/hotel_bookings.csv")

As I am already familiar with hotel booking data from challenge 2,4. I choose this data set to work on in challenge 6. Because from challenge 2,4 there is lot of time dependent data which can be better analyzed.

Briefly describe the data

Some of my analysis is based on challenge 2,4. Reflecting on those please refer below.

HotelBookings_csv
# A tibble: 119,390 × 32
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 22 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …

By just viewing the data it looks like the data is about 119,390 hotel entries and detailing for 32 features. The features mainly describe the booking entirely based on their arrival, cancellations and timings. It also accounts the number of babies, children, adults across the world. There is a separate field to verify for the repeated guests. To understand further we need to perform more operations.

summary(HotelBookings_csv)
    hotel            is_canceled       lead_time   arrival_date_year
 Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
 Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
 Mode  :character   Median :0.0000   Median : 69   Median :2016     
                    Mean   :0.3704   Mean   :104   Mean   :2016     
                    3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                    Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                    
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 Length:119390      Min.   : 1.00            Min.   : 1.0             
 Class :character   1st Qu.:16.00            1st Qu.: 8.0             
 Mode  :character   Median :28.00            Median :16.0             
                    Mean   :27.17            Mean   :15.8             
                    3rd Qu.:38.00            3rd Qu.:23.0             
                    Max.   :53.00            Max.   :31.0             
                                                                      
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies              meal             country         
 Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
 1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
 Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
 Mean   : 0.1039   Mean   : 0.007949                                        
 3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
 Max.   :10.0000   Max.   :10.000000                                        
 NA's   :4                                                                  
 market_segment     distribution_channel is_repeated_guest
 Length:119390      Length:119390        Min.   :0.00000  
 Class :character   Class :character     1st Qu.:0.00000  
 Mode  :character   Mode  :character     Median :0.00000  
                                         Mean   :0.03191  
                                         3rd Qu.:0.00000  
                                         Max.   :1.00000  
                                                          
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                Length:119390     
 1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
 Median : 0.00000       Median : 0.0000                Mode  :character  
 Mean   : 0.08712       Mean   : 0.1371                                  
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
 Max.   :26.00000       Max.   :72.0000                                  
                                                                         
 assigned_room_type booking_changes   deposit_type          agent          
 Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
 Class :character   1st Qu.: 0.0000   Class :character   Class :character  
 Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
                    Mean   : 0.2211                                        
                    3rd Qu.: 0.0000                                        
                    Max.   :21.0000                                        
                                                                           
   company          days_in_waiting_list customer_type           adr         
 Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
 Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
 Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
                    Mean   :  2.321                         Mean   : 101.83  
                    3rd Qu.:  0.000                         3rd Qu.: 126.00  
                    Max.   :391.000                         Max.   :5400.00  
                                                                             
 required_car_parking_spaces total_of_special_requests reservation_status
 Min.   :0.00000             Min.   :0.0000            Length:119390     
 1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
 Median :0.00000             Median :0.0000            Mode  :character  
 Mean   :0.06252             Mean   :0.5714                              
 3rd Qu.:0.00000             3rd Qu.:1.0000                              
 Max.   :8.00000             Max.   :5.0000                              
                                                                         
 reservation_status_date
 Min.   :2014-10-17     
 1st Qu.:2016-02-01     
 Median :2016-08-07     
 Mean   :2016-07-30     
 3rd Qu.:2017-02-08     
 Max.   :2017-09-14     
                        
colnames(HotelBookings_csv)
 [1] "hotel"                          "is_canceled"                   
 [3] "lead_time"                      "arrival_date_year"             
 [5] "arrival_date_month"             "arrival_date_week_number"      
 [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
 [9] "stays_in_week_nights"           "adults"                        
[11] "children"                       "babies"                        
[13] "meal"                           "country"                       
[15] "market_segment"                 "distribution_channel"          
[17] "is_repeated_guest"              "previous_cancellations"        
[19] "previous_bookings_not_canceled" "reserved_room_type"            
[21] "assigned_room_type"             "booking_changes"               
[23] "deposit_type"                   "agent"                         
[25] "company"                        "days_in_waiting_list"          
[27] "customer_type"                  "adr"                           
[29] "required_car_parking_spaces"    "total_of_special_requests"     
[31] "reservation_status"             "reservation_status_date"       
unique(HotelBookings_csv$deposit_type)
[1] "No Deposit" "Refundable" "Non Refund"
length(unique(HotelBookings_csv$market_segment))
[1] 8
unique(HotelBookings_csv$market_segment)
[1] "Direct"        "Corporate"     "Online TA"     "Offline TA/TO"
[5] "Complementary" "Groups"        "Undefined"     "Aviation"     
length(unique(HotelBookings_csv$market_segment))
[1] 8
unique(HotelBookings_csv$distribution_channel)
[1] "Direct"    "Corporate" "TA/TO"     "Undefined" "GDS"      
length(unique(HotelBookings_csv$distribution_channel))
[1] 5
unique(HotelBookings_csv$hotel)
[1] "Resort Hotel" "City Hotel"  
length(unique(HotelBookings_csv$hotel))
[1] 2
unique(HotelBookings_csv$country)
  [1] "PRT"  "GBR"  "USA"  "ESP"  "IRL"  "FRA"  "NULL" "ROU"  "NOR"  "OMN" 
 [11] "ARG"  "POL"  "DEU"  "BEL"  "CHE"  "CN"   "GRC"  "ITA"  "NLD"  "DNK" 
 [21] "RUS"  "SWE"  "AUS"  "EST"  "CZE"  "BRA"  "FIN"  "MOZ"  "BWA"  "LUX" 
 [31] "SVN"  "ALB"  "IND"  "CHN"  "MEX"  "MAR"  "UKR"  "SMR"  "LVA"  "PRI" 
 [41] "SRB"  "CHL"  "AUT"  "BLR"  "LTU"  "TUR"  "ZAF"  "AGO"  "ISR"  "CYM" 
 [51] "ZMB"  "CPV"  "ZWE"  "DZA"  "KOR"  "CRI"  "HUN"  "ARE"  "TUN"  "JAM" 
 [61] "HRV"  "HKG"  "IRN"  "GEO"  "AND"  "GIB"  "URY"  "JEY"  "CAF"  "CYP" 
 [71] "COL"  "GGY"  "KWT"  "NGA"  "MDV"  "VEN"  "SVK"  "FJI"  "KAZ"  "PAK" 
 [81] "IDN"  "LBN"  "PHL"  "SEN"  "SYC"  "AZE"  "BHR"  "NZL"  "THA"  "DOM" 
 [91] "MKD"  "MYS"  "ARM"  "JPN"  "LKA"  "CUB"  "CMR"  "BIH"  "MUS"  "COM" 
[101] "SUR"  "UGA"  "BGR"  "CIV"  "JOR"  "SYR"  "SGP"  "BDI"  "SAU"  "VNM" 
[111] "PLW"  "QAT"  "EGY"  "PER"  "MLT"  "MWI"  "ECU"  "MDG"  "ISL"  "UZB" 
[121] "NPL"  "BHS"  "MAC"  "TGO"  "TWN"  "DJI"  "STP"  "KNA"  "ETH"  "IRQ" 
[131] "HND"  "RWA"  "KHM"  "MCO"  "BGD"  "IMN"  "TJK"  "NIC"  "BEN"  "VGB" 
[141] "TZA"  "GAB"  "GHA"  "TMP"  "GLP"  "KEN"  "LIE"  "GNB"  "MNE"  "UMI" 
[151] "MYT"  "FRO"  "MMR"  "PAN"  "BFA"  "LBY"  "MLI"  "NAM"  "BOL"  "PRY" 
[161] "BRB"  "ABW"  "AIA"  "SLV"  "DMA"  "PYF"  "GUY"  "LCA"  "ATA"  "GTM" 
[171] "ASM"  "MRT"  "NCL"  "KIR"  "SDN"  "ATF"  "SLE"  "LAO" 
length(unique(HotelBookings_csv$country))
[1] 178

After the following analysis it is clear that the data has been collected across the world for different countries approximately 150-180 from 2015 to 2017. The data is very specific to two kinds of hotels- “Resort Hotel”, “City Hotel”. There are majorly 8 kinds of bookings which include all the professional to personal types like- Corporate, Aviation etc. If we observe the mean from the summaries it can be said that there were approximately 185% adults, children 10%, and 1% baby have come to stay in the hotels. Similarly, on an average people stayed for 2.5 days during the week and 1 day during the weekends. The stats are only based on the summaries. To further conclude more accurately for this data need more analysis.

Tidy Data (as needed)

#Null values in country column
table(HotelBookings_csv$country)

  ABW   AGO   AIA   ALB   AND   ARE   ARG   ARM   ASM   ATA   ATF   AUS   AUT 
    2   362     1    12     7    51   214     8     1     2     1   426  1263 
  AZE   BDI   BEL   BEN   BFA   BGD   BGR   BHR   BHS   BIH   BLR   BOL   BRA 
   17     1  2342     3     1    12    75     5     1    13    26    10  2224 
  BRB   BWA   CAF   CHE   CHL   CHN   CIV   CMR    CN   COL   COM   CPV   CRI 
    4     1     5  1730    65   999     6    10  1279    71     2    24    19 
  CUB   CYM   CYP   CZE   DEU   DJI   DMA   DNK   DOM   DZA   ECU   EGY   ESP 
    8     1    51   171  7287     1     1   435    14   103    27    32  8568 
  EST   ETH   FIN   FJI   FRA   FRO   GAB   GBR   GEO   GGY   GHA   GIB   GLP 
   83     3   447     1 10415     5     4 12129    22     3     4    18     2 
  GNB   GRC   GTM   GUY   HKG   HND   HRV   HUN   IDN   IMN   IND   IRL   IRN 
    9   128     4     1    29     1   100   230    35     2   152  3375    83 
  IRQ   ISL   ISR   ITA   JAM   JEY   JOR   JPN   KAZ   KEN   KHM   KIR   KNA 
   14    57   669  3766     6     8    21   197    19     6     2     1     2 
  KOR   KWT   LAO   LBN   LBY   LCA   LIE   LKA   LTU   LUX   LVA   MAC   MAR 
  133    16     2    31     8     1     3     7    81   287    55    16   259 
  MCO   MDG   MDV   MEX   MKD   MLI   MLT   MMR   MNE   MOZ   MRT   MUS   MWI 
    4     1    12    85    10     1    18     1     5    67     1     7     2 
  MYS   MYT   NAM   NCL   NGA   NIC   NLD   NOR   NPL  NULL   NZL   OMN   PAK 
   28     2     1     1    34     1  2104   607     1   488    74    18    14 
  PAN   PER   PHL   PLW   POL   PRI   PRT   PRY   PYF   QAT   ROU   RUS   RWA 
    9    29    40     1   919    12 48590     4     1    15   500   632     2 
  SAU   SDN   SEN   SGP   SLE   SLV   SMR   SRB   STP   SUR   SVK   SVN   SWE 
   48     1    11    39     1     2     1   101     2     5    65    57  1024 
  SYC   SYR   TGO   THA   TJK   TMP   TUN   TUR   TWN   TZA   UGA   UKR   UMI 
    2     3     2    59     9     3    39   248    51     5     2    68     1 
  URY   USA   UZB   VEN   VGB   VNM   ZAF   ZMB   ZWE 
   32  2097     4    26     1     8    80     2     4 

After removing the rows which has NULL values in country coulmn as this is irrelevant to our analysis.

# Removing Null values
HotelBookings_csv <- HotelBookings_csv %>% 
  filter(!(country == "NULL"))

As arrival year, month, date has been stored in three different columns we can combine the arrival date in year, month and the date into one single field and name it as the arrival date. Also, I feel arrival_date_week_number is irrelevant for my analysis as we can take a good guess of the week number from the new field arrival date.

# Remove Columns by Index
HotelBookings_csv <- HotelBookings_csv[,-6]
HotelBookings_csv
# A tibble: 118,902 × 31
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ stays…⁶ stays…⁷ adults child…⁸
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
 1 Resor…       0     342    2015 July          1       0       0      2       0
 2 Resor…       0     737    2015 July          1       0       0      2       0
 3 Resor…       0       7    2015 July          1       0       1      1       0
 4 Resor…       0      13    2015 July          1       0       1      1       0
 5 Resor…       0      14    2015 July          1       0       2      2       0
 6 Resor…       0      14    2015 July          1       0       2      2       0
 7 Resor…       0       0    2015 July          1       0       2      2       0
 8 Resor…       0       9    2015 July          1       0       2      2       0
 9 Resor…       1      85    2015 July          1       0       3      2       0
10 Resor…       1      75    2015 July          1       0       3      2       0
# … with 118,892 more rows, 21 more variables: babies <dbl>, meal <chr>,
#   country <chr>, market_segment <chr>, distribution_channel <chr>,
#   is_repeated_guest <dbl>, previous_cancellations <dbl>,
#   previous_bookings_not_canceled <dbl>, reserved_room_type <chr>,
#   assigned_room_type <chr>, booking_changes <dbl>, deposit_type <chr>,
#   agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …

Are there any variables that require mutation to be usable in your analysis stream? For example, do you need to calculate new values in order to graph them? Can string values be represented numerically? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

As the data has been modified as per required I’ll go ahead and mutate it to have single field for arrivale date as mentioned above.

#Mutating the arrival date into a single field 

HotelBookings_csv_mutate <- HotelBookings_csv %>% 
  mutate(arrival_date = str_c(arrival_date_year, 
                              arrival_date_month, arrival_date_day_of_month, sep="/"),
         arrival_date = lubridate::ymd(arrival_date)) %>% 
  select(-c(arrival_date_year,arrival_date_month, arrival_date_day_of_month))

HotelBookings_csv_mutate
# A tibble: 118,902 × 29
   hotel     is_ca…¹ lead_…² stays…³ stays…⁴ adults child…⁵ babies meal  country
   <chr>       <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl> <chr> <chr>  
 1 Resort H…       0     342       0       0      2       0      0 BB    PRT    
 2 Resort H…       0     737       0       0      2       0      0 BB    PRT    
 3 Resort H…       0       7       0       1      1       0      0 BB    GBR    
 4 Resort H…       0      13       0       1      1       0      0 BB    GBR    
 5 Resort H…       0      14       0       2      2       0      0 BB    GBR    
 6 Resort H…       0      14       0       2      2       0      0 BB    GBR    
 7 Resort H…       0       0       0       2      2       0      0 BB    PRT    
 8 Resort H…       0       9       0       2      2       0      0 FB    PRT    
 9 Resort H…       1      85       0       3      2       0      0 BB    PRT    
10 Resort H…       1      75       0       3      2       0      0 HB    PRT    
# … with 118,892 more rows, 19 more variables: market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>,
#   total_of_special_requests <dbl>, reservation_status <chr>, …

Similarly, changing the datatype of the columns of company and agent as they have numerical values due to some NULL values it has character as datatype. First handling NULL values and changing them NA as the targetted datatype will be numeric.

#Mutating the class of the agent and company field from character to numeric

HotelBookings_csv_mutate <- HotelBookings_csv_mutate %>%
  mutate(across(c(agent, company),~ replace(.,str_detect(., "NULL"), NA))) %>% mutate_at(vars(agent, company),as.numeric)

Now reviewing the summary of the mutated data.

summary(HotelBookings_csv_mutate)
    hotel            is_canceled       lead_time     stays_in_weekend_nights
 Length:118902      Min.   :0.0000   Min.   :  0.0   Min.   : 0.0000        
 Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.: 0.0000        
 Mode  :character   Median :0.0000   Median : 69.0   Median : 1.0000        
                    Mean   :0.3714   Mean   :104.3   Mean   : 0.9289        
                    3rd Qu.:1.0000   3rd Qu.:161.0   3rd Qu.: 2.0000        
                    Max.   :1.0000   Max.   :737.0   Max.   :16.0000        
                                                                            
 stays_in_week_nights     adults          children           babies         
 Min.   : 0.000       Min.   : 0.000   Min.   : 0.0000   Min.   : 0.000000  
 1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.: 0.0000   1st Qu.: 0.000000  
 Median : 2.000       Median : 2.000   Median : 0.0000   Median : 0.000000  
 Mean   : 2.502       Mean   : 1.858   Mean   : 0.1042   Mean   : 0.007948  
 3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.: 0.0000   3rd Qu.: 0.000000  
 Max.   :41.000       Max.   :55.000   Max.   :10.0000   Max.   :10.000000  
                                       NA's   :4                            
     meal             country          market_segment     distribution_channel
 Length:118902      Length:118902      Length:118902      Length:118902       
 Class :character   Class :character   Class :character   Class :character    
 Mode  :character   Mode  :character   Mode  :character   Mode  :character    
                                                                              
                                                                              
                                                                              
                                                                              
 is_repeated_guest previous_cancellations previous_bookings_not_canceled
 Min.   :0.00000   Min.   : 0.00000       Min.   : 0.0000               
 1st Qu.:0.00000   1st Qu.: 0.00000       1st Qu.: 0.0000               
 Median :0.00000   Median : 0.00000       Median : 0.0000               
 Mean   :0.03201   Mean   : 0.08714       Mean   : 0.1316               
 3rd Qu.:0.00000   3rd Qu.: 0.00000       3rd Qu.: 0.0000               
 Max.   :1.00000   Max.   :26.00000       Max.   :72.0000               
                                                                        
 reserved_room_type assigned_room_type booking_changes   deposit_type      
 Length:118902      Length:118902      Min.   : 0.0000   Length:118902     
 Class :character   Class :character   1st Qu.: 0.0000   Class :character  
 Mode  :character   Mode  :character   Median : 0.0000   Mode  :character  
                                       Mean   : 0.2212                     
                                       3rd Qu.: 0.0000                     
                                       Max.   :21.0000                     
                                                                           
     agent           company       days_in_waiting_list customer_type     
 Min.   :  1.00   Min.   :  6.0    Min.   :  0.000      Length:118902     
 1st Qu.:  9.00   1st Qu.: 62.0    1st Qu.:  0.000      Class :character  
 Median : 14.00   Median :179.0    Median :  0.000      Mode  :character  
 Mean   : 86.54   Mean   :189.6    Mean   :  2.331                        
 3rd Qu.:229.00   3rd Qu.:270.0    3rd Qu.:  0.000                        
 Max.   :535.00   Max.   :543.0    Max.   :391.000                        
 NA's   :16006    NA's   :112279                                          
      adr          required_car_parking_spaces total_of_special_requests
 Min.   :  -6.38   Min.   :0.00000             Min.   :0.0000           
 1st Qu.:  70.00   1st Qu.:0.00000             1st Qu.:0.0000           
 Median :  95.00   Median :0.00000             Median :0.0000           
 Mean   : 102.00   Mean   :0.06188             Mean   :0.5717           
 3rd Qu.: 126.00   3rd Qu.:0.00000             3rd Qu.:1.0000           
 Max.   :5400.00   Max.   :8.00000             Max.   :5.0000           
                                                                        
 reservation_status reservation_status_date  arrival_date       
 Length:118902      Min.   :2014-10-17      Min.   :2015-07-01  
 Class :character   1st Qu.:2016-02-02      1st Qu.:2016-03-14  
 Mode  :character   Median :2016-08-08      Median :2016-09-07  
                    Mean   :2016-07-30      Mean   :2016-08-29  
                    3rd Qu.:2017-02-09      3rd Qu.:2017-03-19  
                    Max.   :2017-09-14      Max.   :2017-08-31  
                                                                

As you can see there is only one field for arrival date and dataype for columns agent and company has been changed to Numeric.

Time Dependent Visualization

ggplot(HotelBookings_csv_mutate, aes(x=arrival_date, y= stays_in_week_nights, color = `hotel`)) + 
  geom_line() + 
  xlab("Year") + 
  ylab("Number of days stay during week") + 
  ggtitle("Year vs Stays_in_week_nights")

ggplot(HotelBookings_csv_mutate, aes(x=arrival_date, y= stays_in_weekend_nights, color = `hotel`)) + 
  geom_line() + 
  xlab("Year") + 
  ylab("Number of days stay during weekend") + 
  ggtitle("Year vs Stays_in_weekend_nights")

I choose to visualize the time series for number days stays based on hotel. This will help the hotel owners to estimate their revenue. It is observed that on an average there are 10 days stay during the week for both resort and city hotels. However, there is one more observation there are good number of bookings for 30 days. This shows some might book a hotel instead of renting a place. One common thing observed is the spike in the start of the year.

ggplot(HotelBookings_csv_mutate, aes(x=arrival_date, y= adults, color = `hotel`)) + 
  geom_line() + 
  xlab("Year") + 
  ylab("Number of people") + 
  ggtitle("Year vs people")

I have added the count of number of adults, babies, and children in the adults section after mutating the data.Based on this, more number of people have been coming to Resort type of hotel than city hotels. This could be due to families coming for vacation to resort hotel and city hotel is majorly used for more formal purpose.

Visualizing Part-Whole Relationships

Presented tree map based on hotel for customer_type, market_segment, and distribution_channel. This gives more details on which kind of customer is coming more to different hotels. Similarly, gives more information on market_segment and distribution_channel.

hotel_new1 <- HotelBookings_csv_mutate %>% 
  group_by(hotel, customer_type) %>% 
  summarize(n = n())

treemap(hotel_new1,
       index = c("hotel", "customer_type"),
       vSize = "n",
       type = "index")

hotel_new2 <- HotelBookings_csv_mutate %>% 
  group_by(hotel, market_segment) %>% 
  summarize(n = n())

treemap(hotel_new2,
       index = c("hotel", "market_segment"),
       vSize = "n",
       type = "index")

hotel_new3 <- HotelBookings_csv_mutate %>% 
  group_by(hotel, distribution_channel) %>% 
  summarize(n = n())

treemap(hotel_new3,
       index = c("hotel", "distribution_channel"),
       vSize = "n",
       type = "index")