challenge_2
Author

Akhilesh Kumar Meghwal

Published

August 21, 2022

Code
library(tidyverse)
library(Hmisc)
library(psych)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
  2. provide summary statistics for different interesting groups within the data, and interpret those statistics

Read in the Data

Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.

  • railroad*.csv or StateCounty2012.xlsx ⭐
  • hotel_bookings ⭐⭐⭐
  • FAOstat*.csv ⭐⭐⭐⭐⭐ (join FAOSTAT_country_groups)
Code
#### Read dataset 'hotel_bookings.csv', available in the `posts/_data` folder, using the read_.csv command.

hotel_bookings <- read.csv('_data/hotel_bookings.csv', stringsAsFactors = TRUE, header = TRUE)

Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Check first 6 rows in the dataset, to get primary understanding of the dataframe structure
Code
head(hotel_bookings)
         hotel is_canceled lead_time arrival_date_year arrival_date_month
1 Resort Hotel           0       342              2015               July
2 Resort Hotel           0       737              2015               July
3 Resort Hotel           0         7              2015               July
4 Resort Hotel           0        13              2015               July
5 Resort Hotel           0        14              2015               July
6 Resort Hotel           0        14              2015               July
  arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
1                       27                         1                       0
2                       27                         1                       0
3                       27                         1                       0
4                       27                         1                       0
5                       27                         1                       0
6                       27                         1                       0
  stays_in_week_nights adults children babies meal country market_segment
1                    0      2        0      0   BB     PRT         Direct
2                    0      2        0      0   BB     PRT         Direct
3                    1      1        0      0   BB     GBR         Direct
4                    1      1        0      0   BB     GBR      Corporate
5                    2      2        0      0   BB     GBR      Online TA
6                    2      2        0      0   BB     GBR      Online TA
  distribution_channel is_repeated_guest previous_cancellations
1               Direct                 0                      0
2               Direct                 0                      0
3               Direct                 0                      0
4            Corporate                 0                      0
5                TA/TO                 0                      0
6                TA/TO                 0                      0
  previous_bookings_not_canceled reserved_room_type assigned_room_type
1                              0                  C                  C
2                              0                  C                  C
3                              0                  A                  C
4                              0                  A                  A
5                              0                  A                  A
6                              0                  A                  A
  booking_changes deposit_type agent company days_in_waiting_list customer_type
1               3   No Deposit  NULL    NULL                    0     Transient
2               4   No Deposit  NULL    NULL                    0     Transient
3               0   No Deposit  NULL    NULL                    0     Transient
4               0   No Deposit   304    NULL                    0     Transient
5               0   No Deposit   240    NULL                    0     Transient
6               0   No Deposit   240    NULL                    0     Transient
  adr required_car_parking_spaces total_of_special_requests reservation_status
1   0                           0                         0          Check-Out
2   0                           0                         0          Check-Out
3  75                           0                         0          Check-Out
4  75                           0                         0          Check-Out
5  98                           0                         1          Check-Out
6  98                           0                         1          Check-Out
  reservation_status_date
1              2015-07-01
2              2015-07-01
3              2015-07-02
4              2015-07-02
5              2015-07-03
6              2015-07-03
View Dataframe in table view
Code
View(hotel_bookings)
Row and column dimentions of dataframe
Code
dim(hotel_bookings)
[1] 119390     32
Column names of the dataframe
Code
colnames(hotel_bookings)
 [1] "hotel"                          "is_canceled"                   
 [3] "lead_time"                      "arrival_date_year"             
 [5] "arrival_date_month"             "arrival_date_week_number"      
 [7] "arrival_date_day_of_month"      "stays_in_weekend_nights"       
 [9] "stays_in_week_nights"           "adults"                        
[11] "children"                       "babies"                        
[13] "meal"                           "country"                       
[15] "market_segment"                 "distribution_channel"          
[17] "is_repeated_guest"              "previous_cancellations"        
[19] "previous_bookings_not_canceled" "reserved_room_type"            
[21] "assigned_room_type"             "booking_changes"               
[23] "deposit_type"                   "agent"                         
[25] "company"                        "days_in_waiting_list"          
[27] "customer_type"                  "adr"                           
[29] "required_car_parking_spaces"    "total_of_special_requests"     
[31] "reservation_status"             "reservation_status_date"       
Column classes of the dataframe
Code
col_classes = data.frame(t(data.frame(lapply(hotel_bookings,class))))
Describe, Dataframe
Code
describe(hotel_bookings)
                               vars      n    mean     sd  median trimmed
hotel*                            1 119390    1.34   0.47    1.00    1.29
is_canceled                       2 119390    0.37   0.48    0.00    0.34
lead_time                         3 119390  104.01 106.86   69.00   87.24
arrival_date_year                 4 119390 2016.16   0.71 2016.00 2016.20
arrival_date_month*               5 119390    6.49   3.54    7.00    6.49
arrival_date_week_number          6 119390   27.17  13.61   28.00   27.16
arrival_date_day_of_month         7 119390   15.80   8.78   16.00   15.80
stays_in_weekend_nights           8 119390    0.93   1.00    1.00    0.84
stays_in_week_nights              9 119390    2.50   1.91    2.00    2.29
adults                           10 119390    1.86   0.58    2.00    1.88
children                         11 119386    0.10   0.40    0.00    0.00
babies                           12 119390    0.01   0.10    0.00    0.00
meal*                            13 119390    1.56   1.07    1.00    1.31
country*                         14 119390   94.59  45.09   82.00   96.85
market_segment*                  15 119390    5.93   1.27    6.00    6.11
distribution_channel*            16 119390    3.59   0.91    4.00    3.80
is_repeated_guest                17 119390    0.03   0.18    0.00    0.00
previous_cancellations           18 119390    0.09   0.84    0.00    0.00
previous_bookings_not_canceled   19 119390    0.14   1.50    0.00    0.00
reserved_room_type*              20 119390    1.99   1.70    1.00    1.64
assigned_room_type*              21 119390    2.33   1.88    1.00    2.02
booking_changes                  22 119390    0.22   0.65    0.00    0.06
deposit_type*                    23 119390    1.12   0.33    1.00    1.03
agent*                           24 119390  211.82 122.75  296.00  221.86
company*                         25 119390  342.22  50.06  353.00  353.00
days_in_waiting_list             26 119390    2.32  17.59    0.00    0.00
customer_type*                   27 119390    3.14   0.58    3.00    3.14
adr                              28 119390  101.83  50.54   94.58   97.71
required_car_parking_spaces      29 119390    0.06   0.25    0.00    0.00
total_of_special_requests        30 119390    0.57   0.79    0.00    0.43
reservation_status*              31 119390    1.65   0.50    2.00    1.67
reservation_status_date*         32 119390  517.75 227.48  525.00  521.18
                                  mad     min  max   range  skew kurtosis   se
hotel*                           0.00    1.00    2    1.00  0.70    -1.51 0.00
is_canceled                      0.00    0.00    1    1.00  0.54    -1.71 0.00
lead_time                       88.96    0.00  737  737.00  1.35     1.70 0.31
arrival_date_year                1.48 2015.00 2017    2.00 -0.23    -0.99 0.00
arrival_date_month*              4.45    1.00   12   11.00 -0.04    -1.23 0.01
arrival_date_week_number        16.31    1.00   53   52.00 -0.01    -0.99 0.04
arrival_date_day_of_month       11.86    1.00   31   30.00  0.00    -1.19 0.03
stays_in_weekend_nights          1.48    0.00   19   19.00  1.38     7.17 0.00
stays_in_week_nights             1.48    0.00   50   50.00  2.86    24.28 0.01
adults                           0.00    0.00   55   55.00 18.32  1352.04 0.00
children                         0.00    0.00   10   10.00  4.11    18.67 0.00
babies                           0.00    0.00   10   10.00 24.65  1633.85 0.00
meal*                            0.00    1.00    5    4.00  1.59     0.96 0.00
country*                        81.54    1.00  178  177.00 -0.16    -1.53 0.13
market_segment*                  1.48    1.00    8    7.00 -1.00     0.15 0.00
distribution_channel*            0.00    1.00    5    4.00 -1.87     1.86 0.00
is_repeated_guest                0.00    0.00    1    1.00  5.33    26.37 0.00
previous_cancellations           0.00    0.00   26   26.00 24.46   674.03 0.00
previous_bookings_not_canceled   0.00    0.00   72   72.00 23.54   767.20 0.00
reserved_room_type*              0.00    1.00   10    9.00  1.44     0.89 0.00
assigned_room_type*              0.00    1.00   12   11.00  1.15     0.53 0.01
booking_changes                  0.00    0.00   21   21.00  6.00    79.39 0.00
deposit_type*                    0.00    1.00    3    2.00  2.38     4.07 0.00
agent*                          56.34    1.00  334  333.00 -0.37    -1.53 0.36
company*                         0.00    1.00  353  352.00 -4.93    23.77 0.14
days_in_waiting_list             0.00    0.00  391  391.00 11.94   186.78 0.05
customer_type*                   0.00    1.00    4    3.00 -1.08     4.55 0.00
adr                             41.25   -6.38 5400 5406.38 10.53  1013.13 0.15
required_car_parking_spaces      0.00    0.00    8    8.00  4.16    30.00 0.00
total_of_special_requests        0.00    0.00    5    5.00  1.35     1.49 0.00
reservation_status*              0.00    1.00    3    2.00 -0.38    -1.25 0.00
reservation_status_date*       275.76    1.00  926  925.00 -0.12    -1.02 0.66
Code
summary(hotel_bookings)
          hotel        is_canceled       lead_time   arrival_date_year
 City Hotel  :79330   Min.   :0.0000   Min.   :  0   Min.   :2015     
 Resort Hotel:40060   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
                      Median :0.0000   Median : 69   Median :2016     
                      Mean   :0.3704   Mean   :104   Mean   :2016     
                      3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                      Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                      
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 August :13877      Min.   : 1.00            Min.   : 1.0             
 July   :12661      1st Qu.:16.00            1st Qu.: 8.0             
 May    :11791      Median :28.00            Median :16.0             
 October:11160      Mean   :27.17            Mean   :15.8             
 April  :11089      3rd Qu.:38.00            3rd Qu.:23.0             
 June   :10939      Max.   :53.00            Max.   :31.0             
 (Other):47873                                                        
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies                 meal          country     
 Min.   : 0.0000   Min.   : 0.000000   BB       :92310   PRT    :48590  
 1st Qu.: 0.0000   1st Qu.: 0.000000   FB       :  798   GBR    :12129  
 Median : 0.0000   Median : 0.000000   HB       :14463   FRA    :10415  
 Mean   : 0.1039   Mean   : 0.007949   SC       :10650   ESP    : 8568  
 3rd Qu.: 0.0000   3rd Qu.: 0.000000   Undefined: 1169   DEU    : 7287  
 Max.   :10.0000   Max.   :10.000000                     ITA    : 3766  
 NA's   :4                                               (Other):28635  
       market_segment  distribution_channel is_repeated_guest
 Online TA    :56477   Corporate: 6677      Min.   :0.00000  
 Offline TA/TO:24219   Direct   :14645      1st Qu.:0.00000  
 Groups       :19811   GDS      :  193      Median :0.00000  
 Direct       :12606   TA/TO    :97870      Mean   :0.03191  
 Corporate    : 5295   Undefined:    5      3rd Qu.:0.00000  
 Complementary:  743                        Max.   :1.00000  
 (Other)      :  239                                         
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                A      :85994     
 1st Qu.: 0.00000       1st Qu.: 0.0000                D      :19201     
 Median : 0.00000       Median : 0.0000                E      : 6535     
 Mean   : 0.08712       Mean   : 0.1371                F      : 2897     
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                G      : 2094     
 Max.   :26.00000       Max.   :72.0000                B      : 1118     
                                                       (Other): 1551     
 assigned_room_type booking_changes       deposit_type        agent      
 A      :74053      Min.   : 0.0000   No Deposit:104641   9      :31961  
 D      :25322      1st Qu.: 0.0000   Non Refund: 14587   NULL   :16340  
 E      : 7806      Median : 0.0000   Refundable:   162   240    :13922  
 F      : 3751      Mean   : 0.2211                       1      : 7191  
 G      : 2553      3rd Qu.: 0.0000                       14     : 3640  
 C      : 2375      Max.   :21.0000                       7      : 3539  
 (Other): 3530                                            (Other):42797  
    company       days_in_waiting_list         customer_type  
 NULL   :112593   Min.   :  0.000      Contract       : 4076  
 40     :   927   1st Qu.:  0.000      Group          :  577  
 223    :   784   Median :  0.000      Transient      :89613  
 67     :   267   Mean   :  2.321      Transient-Party:25124  
 45     :   250   3rd Qu.:  0.000                             
 153    :   215   Max.   :391.000                             
 (Other):  4354                                               
      adr          required_car_parking_spaces total_of_special_requests
 Min.   :  -6.38   Min.   :0.00000             Min.   :0.0000           
 1st Qu.:  69.29   1st Qu.:0.00000             1st Qu.:0.0000           
 Median :  94.58   Median :0.00000             Median :0.0000           
 Mean   : 101.83   Mean   :0.06252             Mean   :0.5714           
 3rd Qu.: 126.00   3rd Qu.:0.00000             3rd Qu.:1.0000           
 Max.   :5400.00   Max.   :8.00000             Max.   :5.0000           
                                                                        
 reservation_status reservation_status_date
 Canceled :43017    2015-10-21:  1461      
 Check-Out:75166    2015-07-06:   805      
 No-Show  : 1207    2016-11-25:   790      
                    2015-01-01:   763      
                    2016-01-18:   625      
                    2015-07-02:   469      
                    (Other)   :114477      
Summary of dataframe
Code
summary(hotel_bookings)
          hotel        is_canceled       lead_time   arrival_date_year
 City Hotel  :79330   Min.   :0.0000   Min.   :  0   Min.   :2015     
 Resort Hotel:40060   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
                      Median :0.0000   Median : 69   Median :2016     
                      Mean   :0.3704   Mean   :104   Mean   :2016     
                      3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                      Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                      
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 August :13877      Min.   : 1.00            Min.   : 1.0             
 July   :12661      1st Qu.:16.00            1st Qu.: 8.0             
 May    :11791      Median :28.00            Median :16.0             
 October:11160      Mean   :27.17            Mean   :15.8             
 April  :11089      3rd Qu.:38.00            3rd Qu.:23.0             
 June   :10939      Max.   :53.00            Max.   :31.0             
 (Other):47873                                                        
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies                 meal          country     
 Min.   : 0.0000   Min.   : 0.000000   BB       :92310   PRT    :48590  
 1st Qu.: 0.0000   1st Qu.: 0.000000   FB       :  798   GBR    :12129  
 Median : 0.0000   Median : 0.000000   HB       :14463   FRA    :10415  
 Mean   : 0.1039   Mean   : 0.007949   SC       :10650   ESP    : 8568  
 3rd Qu.: 0.0000   3rd Qu.: 0.000000   Undefined: 1169   DEU    : 7287  
 Max.   :10.0000   Max.   :10.000000                     ITA    : 3766  
 NA's   :4                                               (Other):28635  
       market_segment  distribution_channel is_repeated_guest
 Online TA    :56477   Corporate: 6677      Min.   :0.00000  
 Offline TA/TO:24219   Direct   :14645      1st Qu.:0.00000  
 Groups       :19811   GDS      :  193      Median :0.00000  
 Direct       :12606   TA/TO    :97870      Mean   :0.03191  
 Corporate    : 5295   Undefined:    5      3rd Qu.:0.00000  
 Complementary:  743                        Max.   :1.00000  
 (Other)      :  239                                         
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                A      :85994     
 1st Qu.: 0.00000       1st Qu.: 0.0000                D      :19201     
 Median : 0.00000       Median : 0.0000                E      : 6535     
 Mean   : 0.08712       Mean   : 0.1371                F      : 2897     
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                G      : 2094     
 Max.   :26.00000       Max.   :72.0000                B      : 1118     
                                                       (Other): 1551     
 assigned_room_type booking_changes       deposit_type        agent      
 A      :74053      Min.   : 0.0000   No Deposit:104641   9      :31961  
 D      :25322      1st Qu.: 0.0000   Non Refund: 14587   NULL   :16340  
 E      : 7806      Median : 0.0000   Refundable:   162   240    :13922  
 F      : 3751      Mean   : 0.2211                       1      : 7191  
 G      : 2553      3rd Qu.: 0.0000                       14     : 3640  
 C      : 2375      Max.   :21.0000                       7      : 3539  
 (Other): 3530                                            (Other):42797  
    company       days_in_waiting_list         customer_type  
 NULL   :112593   Min.   :  0.000      Contract       : 4076  
 40     :   927   1st Qu.:  0.000      Group          :  577  
 223    :   784   Median :  0.000      Transient      :89613  
 67     :   267   Mean   :  2.321      Transient-Party:25124  
 45     :   250   3rd Qu.:  0.000                             
 153    :   215   Max.   :391.000                             
 (Other):  4354                                               
      adr          required_car_parking_spaces total_of_special_requests
 Min.   :  -6.38   Min.   :0.00000             Min.   :0.0000           
 1st Qu.:  69.29   1st Qu.:0.00000             1st Qu.:0.0000           
 Median :  94.58   Median :0.00000             Median :0.0000           
 Mean   : 101.83   Mean   :0.06252             Mean   :0.5714           
 3rd Qu.: 126.00   3rd Qu.:0.00000             3rd Qu.:1.0000           
 Max.   :5400.00   Max.   :8.00000             Max.   :5.0000           
                                                                        
 reservation_status reservation_status_date
 Canceled :43017    2015-10-21:  1461      
 Check-Out:75166    2015-07-06:   805      
 No-Show  : 1207    2016-11-25:   790      
                    2015-01-01:   763      
                    2016-01-18:   625      
                    2015-07-02:   469      
                    (Other)   :114477      
group_by summary hotel wise
Code
    a <- hotel_bookings%>%
      group_by('hotel')
    
describe(a)
                               vars      n    mean     sd  median trimmed
hotel*                            1 119390    1.34   0.47    1.00    1.29
is_canceled                       2 119390    0.37   0.48    0.00    0.34
lead_time                         3 119390  104.01 106.86   69.00   87.24
arrival_date_year                 4 119390 2016.16   0.71 2016.00 2016.20
arrival_date_month*               5 119390    6.49   3.54    7.00    6.49
arrival_date_week_number          6 119390   27.17  13.61   28.00   27.16
arrival_date_day_of_month         7 119390   15.80   8.78   16.00   15.80
stays_in_weekend_nights           8 119390    0.93   1.00    1.00    0.84
stays_in_week_nights              9 119390    2.50   1.91    2.00    2.29
adults                           10 119390    1.86   0.58    2.00    1.88
children                         11 119386    0.10   0.40    0.00    0.00
babies                           12 119390    0.01   0.10    0.00    0.00
meal*                            13 119390    1.56   1.07    1.00    1.31
country*                         14 119390   94.59  45.09   82.00   96.85
market_segment*                  15 119390    5.93   1.27    6.00    6.11
distribution_channel*            16 119390    3.59   0.91    4.00    3.80
is_repeated_guest                17 119390    0.03   0.18    0.00    0.00
previous_cancellations           18 119390    0.09   0.84    0.00    0.00
previous_bookings_not_canceled   19 119390    0.14   1.50    0.00    0.00
reserved_room_type*              20 119390    1.99   1.70    1.00    1.64
assigned_room_type*              21 119390    2.33   1.88    1.00    2.02
booking_changes                  22 119390    0.22   0.65    0.00    0.06
deposit_type*                    23 119390    1.12   0.33    1.00    1.03
agent*                           24 119390  211.82 122.75  296.00  221.86
company*                         25 119390  342.22  50.06  353.00  353.00
days_in_waiting_list             26 119390    2.32  17.59    0.00    0.00
customer_type*                   27 119390    3.14   0.58    3.00    3.14
adr                              28 119390  101.83  50.54   94.58   97.71
required_car_parking_spaces      29 119390    0.06   0.25    0.00    0.00
total_of_special_requests        30 119390    0.57   0.79    0.00    0.43
reservation_status*              31 119390    1.65   0.50    2.00    1.67
reservation_status_date*         32 119390  517.75 227.48  525.00  521.18
"hotel"*                         33 119390    1.00   0.00    1.00    1.00
                                  mad     min  max   range  skew kurtosis   se
hotel*                           0.00    1.00    2    1.00  0.70    -1.51 0.00
is_canceled                      0.00    0.00    1    1.00  0.54    -1.71 0.00
lead_time                       88.96    0.00  737  737.00  1.35     1.70 0.31
arrival_date_year                1.48 2015.00 2017    2.00 -0.23    -0.99 0.00
arrival_date_month*              4.45    1.00   12   11.00 -0.04    -1.23 0.01
arrival_date_week_number        16.31    1.00   53   52.00 -0.01    -0.99 0.04
arrival_date_day_of_month       11.86    1.00   31   30.00  0.00    -1.19 0.03
stays_in_weekend_nights          1.48    0.00   19   19.00  1.38     7.17 0.00
stays_in_week_nights             1.48    0.00   50   50.00  2.86    24.28 0.01
adults                           0.00    0.00   55   55.00 18.32  1352.04 0.00
children                         0.00    0.00   10   10.00  4.11    18.67 0.00
babies                           0.00    0.00   10   10.00 24.65  1633.85 0.00
meal*                            0.00    1.00    5    4.00  1.59     0.96 0.00
country*                        81.54    1.00  178  177.00 -0.16    -1.53 0.13
market_segment*                  1.48    1.00    8    7.00 -1.00     0.15 0.00
distribution_channel*            0.00    1.00    5    4.00 -1.87     1.86 0.00
is_repeated_guest                0.00    0.00    1    1.00  5.33    26.37 0.00
previous_cancellations           0.00    0.00   26   26.00 24.46   674.03 0.00
previous_bookings_not_canceled   0.00    0.00   72   72.00 23.54   767.20 0.00
reserved_room_type*              0.00    1.00   10    9.00  1.44     0.89 0.00
assigned_room_type*              0.00    1.00   12   11.00  1.15     0.53 0.01
booking_changes                  0.00    0.00   21   21.00  6.00    79.39 0.00
deposit_type*                    0.00    1.00    3    2.00  2.38     4.07 0.00
agent*                          56.34    1.00  334  333.00 -0.37    -1.53 0.36
company*                         0.00    1.00  353  352.00 -4.93    23.77 0.14
days_in_waiting_list             0.00    0.00  391  391.00 11.94   186.78 0.05
customer_type*                   0.00    1.00    4    3.00 -1.08     4.55 0.00
adr                             41.25   -6.38 5400 5406.38 10.53  1013.13 0.15
required_car_parking_spaces      0.00    0.00    8    8.00  4.16    30.00 0.00
total_of_special_requests        0.00    0.00    5    5.00  1.35     1.49 0.00
reservation_status*              0.00    1.00    3    2.00 -0.38    -1.25 0.00
reservation_status_date*       275.76    1.00  926  925.00 -0.12    -1.02 0.66
"hotel"*                         0.00    1.00    1    0.00   NaN      NaN 0.00
Code
summary(a)
          hotel        is_canceled       lead_time   arrival_date_year
 City Hotel  :79330   Min.   :0.0000   Min.   :  0   Min.   :2015     
 Resort Hotel:40060   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
                      Median :0.0000   Median : 69   Median :2016     
                      Mean   :0.3704   Mean   :104   Mean   :2016     
                      3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                      Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                      
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 August :13877      Min.   : 1.00            Min.   : 1.0             
 July   :12661      1st Qu.:16.00            1st Qu.: 8.0             
 May    :11791      Median :28.00            Median :16.0             
 October:11160      Mean   :27.17            Mean   :15.8             
 April  :11089      3rd Qu.:38.00            3rd Qu.:23.0             
 June   :10939      Max.   :53.00            Max.   :31.0             
 (Other):47873                                                        
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies                 meal          country     
 Min.   : 0.0000   Min.   : 0.000000   BB       :92310   PRT    :48590  
 1st Qu.: 0.0000   1st Qu.: 0.000000   FB       :  798   GBR    :12129  
 Median : 0.0000   Median : 0.000000   HB       :14463   FRA    :10415  
 Mean   : 0.1039   Mean   : 0.007949   SC       :10650   ESP    : 8568  
 3rd Qu.: 0.0000   3rd Qu.: 0.000000   Undefined: 1169   DEU    : 7287  
 Max.   :10.0000   Max.   :10.000000                     ITA    : 3766  
 NA's   :4                                               (Other):28635  
       market_segment  distribution_channel is_repeated_guest
 Online TA    :56477   Corporate: 6677      Min.   :0.00000  
 Offline TA/TO:24219   Direct   :14645      1st Qu.:0.00000  
 Groups       :19811   GDS      :  193      Median :0.00000  
 Direct       :12606   TA/TO    :97870      Mean   :0.03191  
 Corporate    : 5295   Undefined:    5      3rd Qu.:0.00000  
 Complementary:  743                        Max.   :1.00000  
 (Other)      :  239                                         
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                A      :85994     
 1st Qu.: 0.00000       1st Qu.: 0.0000                D      :19201     
 Median : 0.00000       Median : 0.0000                E      : 6535     
 Mean   : 0.08712       Mean   : 0.1371                F      : 2897     
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                G      : 2094     
 Max.   :26.00000       Max.   :72.0000                B      : 1118     
                                                       (Other): 1551     
 assigned_room_type booking_changes       deposit_type        agent      
 A      :74053      Min.   : 0.0000   No Deposit:104641   9      :31961  
 D      :25322      1st Qu.: 0.0000   Non Refund: 14587   NULL   :16340  
 E      : 7806      Median : 0.0000   Refundable:   162   240    :13922  
 F      : 3751      Mean   : 0.2211                       1      : 7191  
 G      : 2553      3rd Qu.: 0.0000                       14     : 3640  
 C      : 2375      Max.   :21.0000                       7      : 3539  
 (Other): 3530                                            (Other):42797  
    company       days_in_waiting_list         customer_type  
 NULL   :112593   Min.   :  0.000      Contract       : 4076  
 40     :   927   1st Qu.:  0.000      Group          :  577  
 223    :   784   Median :  0.000      Transient      :89613  
 67     :   267   Mean   :  2.321      Transient-Party:25124  
 45     :   250   3rd Qu.:  0.000                             
 153    :   215   Max.   :391.000                             
 (Other):  4354                                               
      adr          required_car_parking_spaces total_of_special_requests
 Min.   :  -6.38   Min.   :0.00000             Min.   :0.0000           
 1st Qu.:  69.29   1st Qu.:0.00000             1st Qu.:0.0000           
 Median :  94.58   Median :0.00000             Median :0.0000           
 Mean   : 101.83   Mean   :0.06252             Mean   :0.5714           
 3rd Qu.: 126.00   3rd Qu.:0.00000             3rd Qu.:1.0000           
 Max.   :5400.00   Max.   :8.00000             Max.   :5.0000           
                                                                        
 reservation_status reservation_status_date   "hotel"         
 Canceled :43017    2015-10-21:  1461       Length:119390     
 Check-Out:75166    2015-07-06:   805       Class :character  
 No-Show  : 1207    2016-11-25:   790       Mode  :character  
                    2015-01-01:   763                         
                    2016-01-18:   625                         
                    2015-07-02:   469                         
                    (Other)   :114477                         
Dataframe summary description using summarytools::dfSummary
Code
print(summarytools::dfSummary(hotel_bookings,
                              varnumbers = FALSE,
                              plain.ascii  = FALSE,
                              style        = "grid",
                              graph.magnif = 0.70,
                              valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

hotel_bookings

Dimensions: 119390 x 32
Duplicates: 31994
Variable Stats / Values Freqs (% of Valid) Graph Missing
hotel [factor]
1. City Hotel
2. Resort Hotel
79330(66.4%)
40060(33.6%)
0 (0.0%)
is_canceled [integer]
Min : 0
Mean : 0.4
Max : 1
0:75166(63.0%)
1:44224(37.0%)
0 (0.0%)
lead_time [integer]
Mean (sd) : 104 (106.9)
min ≤ med ≤ max:
0 ≤ 69 ≤ 737
IQR (CV) : 142 (1)
479 distinct values 0 (0.0%)
arrival_date_year [integer]
Mean (sd) : 2016.2 (0.7)
min ≤ med ≤ max:
2015 ≤ 2016 ≤ 2017
IQR (CV) : 1 (0)
2015:21996(18.4%)
2016:56707(47.5%)
2017:40687(34.1%)
0 (0.0%)
arrival_date_month [factor]
1. April
2. August
3. December
4. February
5. January
6. July
7. June
8. March
9. May
10. November
[ 2 others ]
11089(9.3%)
13877(11.6%)
6780(5.7%)
8068(6.8%)
5929(5.0%)
12661(10.6%)
10939(9.2%)
9794(8.2%)
11791(9.9%)
6794(5.7%)
21668(18.1%)
0 (0.0%)
arrival_date_week_number [integer]
Mean (sd) : 27.2 (13.6)
min ≤ med ≤ max:
1 ≤ 28 ≤ 53
IQR (CV) : 22 (0.5)
53 distinct values 0 (0.0%)
arrival_date_day_of_month [integer]
Mean (sd) : 15.8 (8.8)
min ≤ med ≤ max:
1 ≤ 16 ≤ 31
IQR (CV) : 15 (0.6)
31 distinct values 0 (0.0%)
stays_in_weekend_nights [integer]
Mean (sd) : 0.9 (1)
min ≤ med ≤ max:
0 ≤ 1 ≤ 19
IQR (CV) : 2 (1.1)
17 distinct values 0 (0.0%)
stays_in_week_nights [integer]
Mean (sd) : 2.5 (1.9)
min ≤ med ≤ max:
0 ≤ 2 ≤ 50
IQR (CV) : 2 (0.8)
35 distinct values 0 (0.0%)
adults [integer]
Mean (sd) : 1.9 (0.6)
min ≤ med ≤ max:
0 ≤ 2 ≤ 55
IQR (CV) : 0 (0.3)
14 distinct values 0 (0.0%)
children [integer]
Mean (sd) : 0.1 (0.4)
min ≤ med ≤ max:
0 ≤ 0 ≤ 10
IQR (CV) : 0 (3.8)
0:110796(92.8%)
1:4861(4.1%)
2:3652(3.1%)
3:76(0.1%)
10:1(0.0%)
4 (0.0%)
babies [integer]
Mean (sd) : 0 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 10
IQR (CV) : 0 (12.3)
0:118473(99.2%)
1:900(0.8%)
2:15(0.0%)
9:1(0.0%)
10:1(0.0%)
0 (0.0%)
meal [factor]
1. BB
2. FB
3. HB
4. SC
5. Undefined
92310(77.3%)
798(0.7%)
14463(12.1%)
10650(8.9%)
1169(1.0%)
0 (0.0%)
country [factor]
1. ABW
2. AGO
3. AIA
4. ALB
5. AND
6. ARE
7. ARG
8. ARM
9. ASM
10. ATA
[ 168 others ]
2(0.0%)
362(0.3%)
1(0.0%)
12(0.0%)
7(0.0%)
51(0.0%)
214(0.2%)
8(0.0%)
1(0.0%)
2(0.0%)
118730(99.4%)
0 (0.0%)
market_segment [factor]
1. Aviation
2. Complementary
3. Corporate
4. Direct
5. Groups
6. Offline TA/TO
7. Online TA
8. Undefined
237(0.2%)
743(0.6%)
5295(4.4%)
12606(10.6%)
19811(16.6%)
24219(20.3%)
56477(47.3%)
2(0.0%)
0 (0.0%)
distribution_channel [factor]
1. Corporate
2. Direct
3. GDS
4. TA/TO
5. Undefined
6677(5.6%)
14645(12.3%)
193(0.2%)
97870(82.0%)
5(0.0%)
0 (0.0%)
is_repeated_guest [integer]
Min : 0
Mean : 0
Max : 1
0:115580(96.8%)
1:3810(3.2%)
0 (0.0%)
previous_cancellations [integer]
Mean (sd) : 0.1 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 26
IQR (CV) : 0 (9.7)
15 distinct values 0 (0.0%)
previous_bookings_not_canceled [integer]
Mean (sd) : 0.1 (1.5)
min ≤ med ≤ max:
0 ≤ 0 ≤ 72
IQR (CV) : 0 (10.9)
73 distinct values 0 (0.0%)
reserved_room_type [factor]
1. A
2. B
3. C
4. D
5. E
6. F
7. G
8. H
9. L
10. P
85994(72.0%)
1118(0.9%)
932(0.8%)
19201(16.1%)
6535(5.5%)
2897(2.4%)
2094(1.8%)
601(0.5%)
6(0.0%)
12(0.0%)
0 (0.0%)
assigned_room_type [factor]
1. A
2. B
3. C
4. D
5. E
6. F
7. G
8. H
9. I
10. K
[ 2 others ]
74053(62.0%)
2163(1.8%)
2375(2.0%)
25322(21.2%)
7806(6.5%)
3751(3.1%)
2553(2.1%)
712(0.6%)
363(0.3%)
279(0.2%)
13(0.0%)
0 (0.0%)
booking_changes [integer]
Mean (sd) : 0.2 (0.7)
min ≤ med ≤ max:
0 ≤ 0 ≤ 21
IQR (CV) : 0 (2.9)
21 distinct values 0 (0.0%)
deposit_type [factor]
1. No Deposit
2. Non Refund
3. Refundable
104641(87.6%)
14587(12.2%)
162(0.1%)
0 (0.0%)
agent [factor]
1. 1
2. 10
3. 103
4. 104
5. 105
6. 106
7. 107
8. 11
9. 110
10. 111
[ 324 others ]
7191(6.0%)
260(0.2%)
21(0.0%)
53(0.0%)
14(0.0%)
2(0.0%)
2(0.0%)
395(0.3%)
12(0.0%)
16(0.0%)
111424(93.3%)
0 (0.0%)
company [factor]
1. 10
2. 100
3. 101
4. 102
5. 103
6. 104
7. 105
8. 106
9. 107
10. 108
[ 343 others ]
1(0.0%)
1(0.0%)
1(0.0%)
1(0.0%)
16(0.0%)
1(0.0%)
8(0.0%)
2(0.0%)
9(0.0%)
11(0.0%)
119339(100.0%)
0 (0.0%)
days_in_waiting_list [integer]
Mean (sd) : 2.3 (17.6)
min ≤ med ≤ max:
0 ≤ 0 ≤ 391
IQR (CV) : 0 (7.6)
128 distinct values 0 (0.0%)
customer_type [factor]
1. Contract
2. Group
3. Transient
4. Transient-Party
4076(3.4%)
577(0.5%)
89613(75.1%)
25124(21.0%)
0 (0.0%)
adr [numeric]
Mean (sd) : 101.8 (50.5)
min ≤ med ≤ max:
-6.4 ≤ 94.6 ≤ 5400
IQR (CV) : 56.7 (0.5)
8879 distinct values 0 (0.0%)
required_car_parking_spaces [integer]
Mean (sd) : 0.1 (0.2)
min ≤ med ≤ max:
0 ≤ 0 ≤ 8
IQR (CV) : 0 (3.9)
0:111974(93.8%)
1:7383(6.2%)
2:28(0.0%)
3:3(0.0%)
8:2(0.0%)
0 (0.0%)
total_of_special_requests [integer]
Mean (sd) : 0.6 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 5
IQR (CV) : 1 (1.4)
0:70318(58.9%)
1:33226(27.8%)
2:12969(10.9%)
3:2497(2.1%)
4:340(0.3%)
5:40(0.0%)
0 (0.0%)
reservation_status [factor]
1. Canceled
2. Check-Out
3. No-Show
43017(36.0%)
75166(63.0%)
1207(1.0%)
0 (0.0%)
reservation_status_date [factor]
1. 2014-10-17
2. 2014-11-18
3. 2015-01-01
4. 2015-01-02
5. 2015-01-18
6. 2015-01-20
7. 2015-01-21
8. 2015-01-22
9. 2015-01-28
10. 2015-01-29
[ 916 others ]
180(0.2%)
1(0.0%)
763(0.6%)
16(0.0%)
1(0.0%)
2(0.0%)
91(0.1%)
6(0.0%)
1(0.0%)
1(0.0%)
118328(99.1%)
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-09-04

Booking details for two hotels, namely “City Hotel &”Resort Hotel” is given in the dataset hotel_bookings
Hotel Bookings can be classified on following columns:
Code
table(hotel_bookings$arrival_date_month)

    April    August  December  February   January      July      June     March 
    11089     13877      6780      8068      5929     12661     10939      9794 
      May  November   October September 
    11791      6794     11160     10508 
Hotel_bookings year wise
Code
table(hotel_bookings$arrival_date_year)

 2015  2016  2017 
21996 56707 40687 

Provide Grouped Summary Statistics

Conduct some exploratory data analysis, using dplyr commands such as group_by(), select(), filter(), and summarise(). Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.

Change class of certain columns to factors to represent categorical data
Code
#select columns for factorization
cols_factor <- c("is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month', 'is_repeated_guest' ) 

# converting class of columns: "is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month' to factor 'is_repeated_guest'

hotel_bookings <-hotel_bookings%>%
  mutate_at(vars(is_canceled, arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month, is_repeated_guest), funs(factor))

# verify class of hotel_bookings: 

sapply(hotel_bookings, class)  
                         hotel                    is_canceled 
                      "factor"                       "factor" 
                     lead_time              arrival_date_year 
                     "integer"                       "factor" 
            arrival_date_month       arrival_date_week_number 
                      "factor"                       "factor" 
     arrival_date_day_of_month        stays_in_weekend_nights 
                      "factor"                      "integer" 
          stays_in_week_nights                         adults 
                     "integer"                      "integer" 
                      children                         babies 
                     "integer"                      "integer" 
                          meal                        country 
                      "factor"                       "factor" 
                market_segment           distribution_channel 
                      "factor"                       "factor" 
             is_repeated_guest         previous_cancellations 
                      "factor"                      "integer" 
previous_bookings_not_canceled             reserved_room_type 
                     "integer"                       "factor" 
            assigned_room_type                booking_changes 
                      "factor"                      "integer" 
                  deposit_type                          agent 
                      "factor"                       "factor" 
                       company           days_in_waiting_list 
                      "factor"                      "integer" 
                 customer_type                            adr 
                      "factor"                      "numeric" 
   required_car_parking_spaces      total_of_special_requests 
                     "integer"                      "integer" 
            reservation_status        reservation_status_date 
                      "factor"                       "factor" 
Hotel wise, year wise, month wise, reservation status wise number of bookings
Code
booking_hymrs<-hotel_bookings%>%
  group_by(hotel, arrival_date_year, arrival_date_month, reservation_status)%>%
  summarise(no_of_bookings=n(),.groups = 'keep') %>% 
  pivot_wider(names_from = c(hotel, reservation_status), values_from = no_of_bookings)
  
booking_hymrs
# A tibble: 26 × 8
# Groups:   arrival_date_year, arrival_date_month [26]
   arrival_date_year arrival_d…¹ City …² City …³ City …⁴ Resor…⁵ Resor…⁶ Resor…⁷
   <fct>             <fct>         <int>   <int>   <int>   <int>   <int>   <int>
 1 2015              August         1217    1248      15     361    1043       5
 2 2015              December        646     986      22     278     961      27
 3 2015              July            918     459      21     314    1058       6
 4 2015              November        277     934      24     170     920      15
 5 2015              October        1311    2065      10     403    1160       8
 6 2015              September      1513    1986      30     543    1034       8
 7 2016              April          1492    2022      47     515    1345       7
 8 2016              August         1214    2131      33     572    1107       6
 9 2016              December       1049    1406      23     316    1056      10
10 2016              February        796    1441     134     396    1113      11
# … with 16 more rows, and abbreviated variable names ¹​arrival_date_month,
#   ²​`City Hotel_Canceled`, ³​`City Hotel_Check-Out`, ⁴​`City Hotel_No-Show`,
#   ⁵​`Resort Hotel_Canceled`, ⁶​`Resort Hotel_Check-Out`,
#   ⁷​`Resort Hotel_No-Show`
# ℹ Use `print(n = ...)` to see more rows
Hotel year wise, month wise, reservation status wise number of cancelled/normal bookings
is_cancelled: 1 = cancelled, 0= normal booking
Code
booking_hymc<-hotel_bookings%>%
  group_by(hotel, arrival_date_year, arrival_date_month)%>%
  summarise(no_of_bookings=sum(!as.numeric(is_canceled)),no_of_cancelled=sum(as.numeric(is_canceled)), .groups = 'keep') %>% 
  pivot_wider(names_from = hotel, values_from = c(no_of_bookings, no_of_cancelled))

booking_hymc
# A tibble: 26 × 6
# Groups:   arrival_date_year, arrival_date_month [26]
   arrival_date_year arrival_date_month no_of_bookings…¹ no_of…² no_of…³ no_of…⁴
   <fct>             <fct>                         <int>   <int>   <dbl>   <dbl>
 1 2015              August                            0       0    3712    1775
 2 2015              December                          0       0    2322    1571
 3 2015              July                              0       0    2337    1698
 4 2015              November                          0       0    1536    1290
 5 2015              October                           0       0    4707    1982
 6 2015              September                         0       0    5072    2136
 7 2016              April                             0       0    5100    2389
 8 2016              August                            0       0    4625    2263
 9 2016              December                          0       0    3550    1708
10 2016              February                          0       0    3301    1927
# … with 16 more rows, and abbreviated variable names
#   ¹​`no_of_bookings_City Hotel`, ²​`no_of_bookings_Resort Hotel`,
#   ³​`no_of_cancelled_City Hotel`, ⁴​`no_of_cancelled_Resort Hotel`
# ℹ Use `print(n = ...)` to see more rows
Hotel wise: Number of cancelled ticket: Canceled and No-Show
Code
booking_can <-hotel_bookings%>%
  filter(is_canceled==1, reservation_status %in% c('No-Show', 'Canceled'))%>%
  group_by(hotel,reservation_status)%>%
  summarise(cancelled_ticket_status=n(), .groups = 'keep') %>% 
  pivot_wider(names_from = hotel, values_from = c(cancelled_ticket_status))

booking_can
# A tibble: 2 × 3
# Groups:   reservation_status [2]
  reservation_status `City Hotel` `Resort Hotel`
  <fct>                     <int>          <int>
1 Canceled                  32186          10831
2 No-Show                     916            291
Hotel year wise, month wise, reservation status wise number of cancelled/normal bookings
Code
booking_cym <-hotel_bookings%>%
  filter(is_canceled==1, reservation_status %in% c('No-Show', 'Canceled'))%>%
  group_by(hotel, arrival_date_year, arrival_date_month, reservation_status)%>%
  summarise(cancelled_ticket_status=n(), .groups = 'keep') %>% 
  pivot_wider(names_from = hotel, values_from = c(cancelled_ticket_status))

booking_cym
# A tibble: 52 × 5
# Groups:   arrival_date_year, arrival_date_month, reservation_status [52]
   arrival_date_year arrival_date_month reservation_status `City Hotel` Resort…¹
   <fct>             <fct>              <fct>                     <int>    <int>
 1 2015              August             Canceled                   1217      361
 2 2015              August             No-Show                      15        5
 3 2015              December           Canceled                    646      278
 4 2015              December           No-Show                      22       27
 5 2015              July               Canceled                    918      314
 6 2015              July               No-Show                      21        6
 7 2015              November           Canceled                    277      170
 8 2015              November           No-Show                      24       15
 9 2015              October            Canceled                   1311      403
10 2015              October            No-Show                      10        8
# … with 42 more rows, and abbreviated variable name ¹​`Resort Hotel`
# ℹ Use `print(n = ...)` to see more rows
Hotel year wise, month wise, country wise reservation status wise number of normal bookings/not-cancelled
To identify and undertand the hotel wise, year wise, month wise, country wise Average Daily Rate for the hotel for the bookings undertaken
Mean, median, standard deviation, minimum, maximum, quantile of adr for hotel > arrival_date_year
To identify the hotel wise, year wise, distribution of Average Daily Rate for the hotel for the bookings undertaken
Code
booking_adr <-hotel_bookings%>%
  filter(is_canceled==0, reservation_status %in% c('No-Show', 'Check-Out'))%>%
  group_by(hotel, arrival_date_year)%>%
  summarise(median_adr=median(adr),avg_adr=mean(adr),sd_adr=sd(adr), min_adr = min(adr),max_adr = max(adr), quantile(adr, c(0.25, 0.5, 0.75)), .groups = 'keep')

booking_adr
# A tibble: 18 × 8
# Groups:   hotel, arrival_date_year [6]
   hotel        arrival_date_year media…¹ avg_adr sd_adr min_adr max_adr quant…²
   <fct>        <fct>               <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1 City Hotel   2015                 85.5    87.9   36.1    0       268.    65  
 2 City Hotel   2015                 85.5    87.9   36.1    0       268.    85.5
 3 City Hotel   2015                 85.5    87.9   36.1    0       268.   105. 
 4 City Hotel   2016                 99     104.    37.7    0       452.    80.8
 5 City Hotel   2016                 99     104.    37.7    0       452.    99  
 6 City Hotel   2016                 99     104.    37.7    0       452.   123. 
 7 City Hotel   2017                112     117.    43.1    0       510     89.1
 8 City Hotel   2017                112     117.    43.1    0       510    112  
 9 City Hotel   2017                112     117.    43.1    0       510    140. 
10 Resort Hotel 2015                 75.5    89.8   53.7    0       508     48  
11 Resort Hotel 2015                 75.5    89.8   53.7    0       508     75.5
12 Resort Hotel 2015                 75.5    89.8   53.7    0       508    124. 
13 Resort Hotel 2016                 66.3    83.9   55.9    0       367     48  
14 Resort Hotel 2016                 66.3    83.9   55.9    0       367     66.3
15 Resort Hotel 2016                 66.3    83.9   55.9    0       367    100  
16 Resort Hotel 2017                 80.9   102.    66.0   -6.38    426.    53.0
17 Resort Hotel 2017                 80.9   102.    66.0   -6.38    426.    80.9
18 Resort Hotel 2017                 80.9   102.    66.0   -6.38    426.   135  
# … with abbreviated variable names ¹​median_adr,
#   ²​`quantile(adr, c(0.25, 0.5, 0.75))`
Leadtime for cancelled booking
To understand if higher lead time is causing cancellation of booking
Code
booking_canc_lt<- hotel_bookings%>%
  select(hotel, is_canceled, lead_time)%>%
  filter(is_canceled==1)%>%
  group_by(hotel)%>%
  summarise(max_leadtime_cancelled = max(lead_time), min_leadtime_cancelled= min(lead_time), sd_leadtime_cancelled = sd(lead_time), mean_lead_time_cancelled = mean(lead_time), .groups = 'keep')

booking_canc_lt
# A tibble: 2 × 5
# Groups:   hotel [2]
  hotel        max_leadtime_cancelled min_leadtime_cancelled sd_leadti…¹ mean_…²
  <fct>                         <int>                  <int>       <dbl>   <dbl>
1 City Hotel                      629                      0       124.     150.
2 Resort Hotel                    471                      0        98.8    129.
# … with abbreviated variable names ¹​sd_leadtime_cancelled,
#   ²​mean_lead_time_cancelled
Leadtime for booking undertaken
To understand if the lead time for bookings undertaken and compare the same with cancelled booking as calculated above
Code
booking_ut_lt<- hotel_bookings%>%
  select(hotel, is_canceled, lead_time)%>%
  filter(is_canceled==0)%>%
  group_by(hotel)%>%
  summarise(max_leadtime = max(lead_time), min_leadtime= min(lead_time), sd_leadtime = sd(lead_time), mean_lead_time = mean(lead_time), .groups = 'keep')

booking_ut_lt
# A tibble: 2 × 5
# Groups:   hotel [2]
  hotel        max_leadtime min_leadtime sd_leadtime mean_lead_time
  <fct>               <int>        <int>       <dbl>          <dbl>
1 City Hotel            518            0        89.9           80.7
2 Resort Hotel          737            0        93.1           78.8
Hotel wise, market segment wise repeated guest bookings undertaken
To identify any specific market segment is generating more number of bookings for the repeated guests.
Code
booking_rg<- hotel_bookings%>%
  select(hotel, market_segment, is_repeated_guest, is_canceled)%>%
  filter(is_repeated_guest==1, is_canceled==0)%>%
  group_by(hotel,market_segment)%>%
  summarise(repeat_guest=n(),.groups = 'keep')

booking_rg
# A tibble: 13 × 3
# Groups:   hotel, market_segment [13]
   hotel        market_segment repeat_guest
   <fct>        <fct>                 <int>
 1 City Hotel   Aviation                 53
 2 City Hotel   Complementary           193
 3 City Hotel   Corporate               842
 4 City Hotel   Direct                  158
 5 City Hotel   Groups                    7
 6 City Hotel   Offline TA/TO           142
 7 City Hotel   Online TA               196
 8 Resort Hotel Complementary            18
 9 Resort Hotel Corporate               537
10 Resort Hotel Direct                  581
11 Resort Hotel Groups                   39
12 Resort Hotel Offline TA/TO           142
13 Resort Hotel Online TA               350
Hotel wise, market segment wise repeated guest bookings cancelled
To identify any specific market segment is generating more number of cancelled bookings for the repeated guests.
Code
booking_rg_can<- hotel_bookings%>%
  select(hotel, market_segment, is_repeated_guest, is_canceled)%>%
  filter(is_repeated_guest==1, is_canceled==1)%>%
  group_by(hotel,market_segment)%>%
  summarise(repeat_guest_cancel=n(),.groups = 'keep') 

booking_rg_can
# A tibble: 12 × 3
# Groups:   hotel, market_segment [12]
   hotel        market_segment repeat_guest_cancel
   <fct>        <fct>                        <int>
 1 City Hotel   Aviation                        11
 2 City Hotel   Complementary                   19
 3 City Hotel   Corporate                       71
 4 City Hotel   Direct                          33
 5 City Hotel   Groups                         194
 6 City Hotel   Offline TA/TO                   77
 7 City Hotel   Online TA                       36
 8 Resort Hotel Corporate                       24
 9 Resort Hotel Direct                          24
10 Resort Hotel Groups                          30
11 Resort Hotel Offline TA/TO                   18
12 Resort Hotel Online TA                       15

Explain and Interpret

Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.