Code
library(tidyverse)
library(Hmisc)
library(psych)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Akhilesh Kumar Meghwal
August 21, 2022
Today’s challenge is to
Read in one (or more) of the following data sets, available in the posts/_data
folder, using the correct R package and command.
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
hotel is_canceled lead_time arrival_date_year arrival_date_month
1 Resort Hotel 0 342 2015 July
2 Resort Hotel 0 737 2015 July
3 Resort Hotel 0 7 2015 July
4 Resort Hotel 0 13 2015 July
5 Resort Hotel 0 14 2015 July
6 Resort Hotel 0 14 2015 July
arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
1 27 1 0
2 27 1 0
3 27 1 0
4 27 1 0
5 27 1 0
6 27 1 0
stays_in_week_nights adults children babies meal country market_segment
1 0 2 0 0 BB PRT Direct
2 0 2 0 0 BB PRT Direct
3 1 1 0 0 BB GBR Direct
4 1 1 0 0 BB GBR Corporate
5 2 2 0 0 BB GBR Online TA
6 2 2 0 0 BB GBR Online TA
distribution_channel is_repeated_guest previous_cancellations
1 Direct 0 0
2 Direct 0 0
3 Direct 0 0
4 Corporate 0 0
5 TA/TO 0 0
6 TA/TO 0 0
previous_bookings_not_canceled reserved_room_type assigned_room_type
1 0 C C
2 0 C C
3 0 A C
4 0 A A
5 0 A A
6 0 A A
booking_changes deposit_type agent company days_in_waiting_list customer_type
1 3 No Deposit NULL NULL 0 Transient
2 4 No Deposit NULL NULL 0 Transient
3 0 No Deposit NULL NULL 0 Transient
4 0 No Deposit 304 NULL 0 Transient
5 0 No Deposit 240 NULL 0 Transient
6 0 No Deposit 240 NULL 0 Transient
adr required_car_parking_spaces total_of_special_requests reservation_status
1 0 0 0 Check-Out
2 0 0 0 Check-Out
3 75 0 0 Check-Out
4 75 0 0 Check-Out
5 98 0 1 Check-Out
6 98 0 1 Check-Out
reservation_status_date
1 2015-07-01
2 2015-07-01
3 2015-07-02
4 2015-07-02
5 2015-07-03
6 2015-07-03
[1] "hotel" "is_canceled"
[3] "lead_time" "arrival_date_year"
[5] "arrival_date_month" "arrival_date_week_number"
[7] "arrival_date_day_of_month" "stays_in_weekend_nights"
[9] "stays_in_week_nights" "adults"
[11] "children" "babies"
[13] "meal" "country"
[15] "market_segment" "distribution_channel"
[17] "is_repeated_guest" "previous_cancellations"
[19] "previous_bookings_not_canceled" "reserved_room_type"
[21] "assigned_room_type" "booking_changes"
[23] "deposit_type" "agent"
[25] "company" "days_in_waiting_list"
[27] "customer_type" "adr"
[29] "required_car_parking_spaces" "total_of_special_requests"
[31] "reservation_status" "reservation_status_date"
vars n mean sd median trimmed
hotel* 1 119390 1.34 0.47 1.00 1.29
is_canceled 2 119390 0.37 0.48 0.00 0.34
lead_time 3 119390 104.01 106.86 69.00 87.24
arrival_date_year 4 119390 2016.16 0.71 2016.00 2016.20
arrival_date_month* 5 119390 6.49 3.54 7.00 6.49
arrival_date_week_number 6 119390 27.17 13.61 28.00 27.16
arrival_date_day_of_month 7 119390 15.80 8.78 16.00 15.80
stays_in_weekend_nights 8 119390 0.93 1.00 1.00 0.84
stays_in_week_nights 9 119390 2.50 1.91 2.00 2.29
adults 10 119390 1.86 0.58 2.00 1.88
children 11 119386 0.10 0.40 0.00 0.00
babies 12 119390 0.01 0.10 0.00 0.00
meal* 13 119390 1.56 1.07 1.00 1.31
country* 14 119390 94.59 45.09 82.00 96.85
market_segment* 15 119390 5.93 1.27 6.00 6.11
distribution_channel* 16 119390 3.59 0.91 4.00 3.80
is_repeated_guest 17 119390 0.03 0.18 0.00 0.00
previous_cancellations 18 119390 0.09 0.84 0.00 0.00
previous_bookings_not_canceled 19 119390 0.14 1.50 0.00 0.00
reserved_room_type* 20 119390 1.99 1.70 1.00 1.64
assigned_room_type* 21 119390 2.33 1.88 1.00 2.02
booking_changes 22 119390 0.22 0.65 0.00 0.06
deposit_type* 23 119390 1.12 0.33 1.00 1.03
agent* 24 119390 211.82 122.75 296.00 221.86
company* 25 119390 342.22 50.06 353.00 353.00
days_in_waiting_list 26 119390 2.32 17.59 0.00 0.00
customer_type* 27 119390 3.14 0.58 3.00 3.14
adr 28 119390 101.83 50.54 94.58 97.71
required_car_parking_spaces 29 119390 0.06 0.25 0.00 0.00
total_of_special_requests 30 119390 0.57 0.79 0.00 0.43
reservation_status* 31 119390 1.65 0.50 2.00 1.67
reservation_status_date* 32 119390 517.75 227.48 525.00 521.18
mad min max range skew kurtosis se
hotel* 0.00 1.00 2 1.00 0.70 -1.51 0.00
is_canceled 0.00 0.00 1 1.00 0.54 -1.71 0.00
lead_time 88.96 0.00 737 737.00 1.35 1.70 0.31
arrival_date_year 1.48 2015.00 2017 2.00 -0.23 -0.99 0.00
arrival_date_month* 4.45 1.00 12 11.00 -0.04 -1.23 0.01
arrival_date_week_number 16.31 1.00 53 52.00 -0.01 -0.99 0.04
arrival_date_day_of_month 11.86 1.00 31 30.00 0.00 -1.19 0.03
stays_in_weekend_nights 1.48 0.00 19 19.00 1.38 7.17 0.00
stays_in_week_nights 1.48 0.00 50 50.00 2.86 24.28 0.01
adults 0.00 0.00 55 55.00 18.32 1352.04 0.00
children 0.00 0.00 10 10.00 4.11 18.67 0.00
babies 0.00 0.00 10 10.00 24.65 1633.85 0.00
meal* 0.00 1.00 5 4.00 1.59 0.96 0.00
country* 81.54 1.00 178 177.00 -0.16 -1.53 0.13
market_segment* 1.48 1.00 8 7.00 -1.00 0.15 0.00
distribution_channel* 0.00 1.00 5 4.00 -1.87 1.86 0.00
is_repeated_guest 0.00 0.00 1 1.00 5.33 26.37 0.00
previous_cancellations 0.00 0.00 26 26.00 24.46 674.03 0.00
previous_bookings_not_canceled 0.00 0.00 72 72.00 23.54 767.20 0.00
reserved_room_type* 0.00 1.00 10 9.00 1.44 0.89 0.00
assigned_room_type* 0.00 1.00 12 11.00 1.15 0.53 0.01
booking_changes 0.00 0.00 21 21.00 6.00 79.39 0.00
deposit_type* 0.00 1.00 3 2.00 2.38 4.07 0.00
agent* 56.34 1.00 334 333.00 -0.37 -1.53 0.36
company* 0.00 1.00 353 352.00 -4.93 23.77 0.14
days_in_waiting_list 0.00 0.00 391 391.00 11.94 186.78 0.05
customer_type* 0.00 1.00 4 3.00 -1.08 4.55 0.00
adr 41.25 -6.38 5400 5406.38 10.53 1013.13 0.15
required_car_parking_spaces 0.00 0.00 8 8.00 4.16 30.00 0.00
total_of_special_requests 0.00 0.00 5 5.00 1.35 1.49 0.00
reservation_status* 0.00 1.00 3 2.00 -0.38 -1.25 0.00
reservation_status_date* 275.76 1.00 926 925.00 -0.12 -1.02 0.66
hotel is_canceled lead_time arrival_date_year
City Hotel :79330 Min. :0.0000 Min. : 0 Min. :2015
Resort Hotel:40060 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
Median :0.0000 Median : 69 Median :2016
Mean :0.3704 Mean :104 Mean :2016
3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
Max. :1.0000 Max. :737 Max. :2017
arrival_date_month arrival_date_week_number arrival_date_day_of_month
August :13877 Min. : 1.00 Min. : 1.0
July :12661 1st Qu.:16.00 1st Qu.: 8.0
May :11791 Median :28.00 Median :16.0
October:11160 Mean :27.17 Mean :15.8
April :11089 3rd Qu.:38.00 3rd Qu.:23.0
June :10939 Max. :53.00 Max. :31.0
(Other):47873
stays_in_weekend_nights stays_in_week_nights adults
Min. : 0.0000 Min. : 0.0 Min. : 0.000
1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
Median : 1.0000 Median : 2.0 Median : 2.000
Mean : 0.9276 Mean : 2.5 Mean : 1.856
3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
Max. :19.0000 Max. :50.0 Max. :55.000
children babies meal country
Min. : 0.0000 Min. : 0.000000 BB :92310 PRT :48590
1st Qu.: 0.0000 1st Qu.: 0.000000 FB : 798 GBR :12129
Median : 0.0000 Median : 0.000000 HB :14463 FRA :10415
Mean : 0.1039 Mean : 0.007949 SC :10650 ESP : 8568
3rd Qu.: 0.0000 3rd Qu.: 0.000000 Undefined: 1169 DEU : 7287
Max. :10.0000 Max. :10.000000 ITA : 3766
NA's :4 (Other):28635
market_segment distribution_channel is_repeated_guest
Online TA :56477 Corporate: 6677 Min. :0.00000
Offline TA/TO:24219 Direct :14645 1st Qu.:0.00000
Groups :19811 GDS : 193 Median :0.00000
Direct :12606 TA/TO :97870 Mean :0.03191
Corporate : 5295 Undefined: 5 3rd Qu.:0.00000
Complementary: 743 Max. :1.00000
(Other) : 239
previous_cancellations previous_bookings_not_canceled reserved_room_type
Min. : 0.00000 Min. : 0.0000 A :85994
1st Qu.: 0.00000 1st Qu.: 0.0000 D :19201
Median : 0.00000 Median : 0.0000 E : 6535
Mean : 0.08712 Mean : 0.1371 F : 2897
3rd Qu.: 0.00000 3rd Qu.: 0.0000 G : 2094
Max. :26.00000 Max. :72.0000 B : 1118
(Other): 1551
assigned_room_type booking_changes deposit_type agent
A :74053 Min. : 0.0000 No Deposit:104641 9 :31961
D :25322 1st Qu.: 0.0000 Non Refund: 14587 NULL :16340
E : 7806 Median : 0.0000 Refundable: 162 240 :13922
F : 3751 Mean : 0.2211 1 : 7191
G : 2553 3rd Qu.: 0.0000 14 : 3640
C : 2375 Max. :21.0000 7 : 3539
(Other): 3530 (Other):42797
company days_in_waiting_list customer_type
NULL :112593 Min. : 0.000 Contract : 4076
40 : 927 1st Qu.: 0.000 Group : 577
223 : 784 Median : 0.000 Transient :89613
67 : 267 Mean : 2.321 Transient-Party:25124
45 : 250 3rd Qu.: 0.000
153 : 215 Max. :391.000
(Other): 4354
adr required_car_parking_spaces total_of_special_requests
Min. : -6.38 Min. :0.00000 Min. :0.0000
1st Qu.: 69.29 1st Qu.:0.00000 1st Qu.:0.0000
Median : 94.58 Median :0.00000 Median :0.0000
Mean : 101.83 Mean :0.06252 Mean :0.5714
3rd Qu.: 126.00 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :5400.00 Max. :8.00000 Max. :5.0000
reservation_status reservation_status_date
Canceled :43017 2015-10-21: 1461
Check-Out:75166 2015-07-06: 805
No-Show : 1207 2016-11-25: 790
2015-01-01: 763
2016-01-18: 625
2015-07-02: 469
(Other) :114477
hotel is_canceled lead_time arrival_date_year
City Hotel :79330 Min. :0.0000 Min. : 0 Min. :2015
Resort Hotel:40060 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
Median :0.0000 Median : 69 Median :2016
Mean :0.3704 Mean :104 Mean :2016
3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
Max. :1.0000 Max. :737 Max. :2017
arrival_date_month arrival_date_week_number arrival_date_day_of_month
August :13877 Min. : 1.00 Min. : 1.0
July :12661 1st Qu.:16.00 1st Qu.: 8.0
May :11791 Median :28.00 Median :16.0
October:11160 Mean :27.17 Mean :15.8
April :11089 3rd Qu.:38.00 3rd Qu.:23.0
June :10939 Max. :53.00 Max. :31.0
(Other):47873
stays_in_weekend_nights stays_in_week_nights adults
Min. : 0.0000 Min. : 0.0 Min. : 0.000
1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
Median : 1.0000 Median : 2.0 Median : 2.000
Mean : 0.9276 Mean : 2.5 Mean : 1.856
3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
Max. :19.0000 Max. :50.0 Max. :55.000
children babies meal country
Min. : 0.0000 Min. : 0.000000 BB :92310 PRT :48590
1st Qu.: 0.0000 1st Qu.: 0.000000 FB : 798 GBR :12129
Median : 0.0000 Median : 0.000000 HB :14463 FRA :10415
Mean : 0.1039 Mean : 0.007949 SC :10650 ESP : 8568
3rd Qu.: 0.0000 3rd Qu.: 0.000000 Undefined: 1169 DEU : 7287
Max. :10.0000 Max. :10.000000 ITA : 3766
NA's :4 (Other):28635
market_segment distribution_channel is_repeated_guest
Online TA :56477 Corporate: 6677 Min. :0.00000
Offline TA/TO:24219 Direct :14645 1st Qu.:0.00000
Groups :19811 GDS : 193 Median :0.00000
Direct :12606 TA/TO :97870 Mean :0.03191
Corporate : 5295 Undefined: 5 3rd Qu.:0.00000
Complementary: 743 Max. :1.00000
(Other) : 239
previous_cancellations previous_bookings_not_canceled reserved_room_type
Min. : 0.00000 Min. : 0.0000 A :85994
1st Qu.: 0.00000 1st Qu.: 0.0000 D :19201
Median : 0.00000 Median : 0.0000 E : 6535
Mean : 0.08712 Mean : 0.1371 F : 2897
3rd Qu.: 0.00000 3rd Qu.: 0.0000 G : 2094
Max. :26.00000 Max. :72.0000 B : 1118
(Other): 1551
assigned_room_type booking_changes deposit_type agent
A :74053 Min. : 0.0000 No Deposit:104641 9 :31961
D :25322 1st Qu.: 0.0000 Non Refund: 14587 NULL :16340
E : 7806 Median : 0.0000 Refundable: 162 240 :13922
F : 3751 Mean : 0.2211 1 : 7191
G : 2553 3rd Qu.: 0.0000 14 : 3640
C : 2375 Max. :21.0000 7 : 3539
(Other): 3530 (Other):42797
company days_in_waiting_list customer_type
NULL :112593 Min. : 0.000 Contract : 4076
40 : 927 1st Qu.: 0.000 Group : 577
223 : 784 Median : 0.000 Transient :89613
67 : 267 Mean : 2.321 Transient-Party:25124
45 : 250 3rd Qu.: 0.000
153 : 215 Max. :391.000
(Other): 4354
adr required_car_parking_spaces total_of_special_requests
Min. : -6.38 Min. :0.00000 Min. :0.0000
1st Qu.: 69.29 1st Qu.:0.00000 1st Qu.:0.0000
Median : 94.58 Median :0.00000 Median :0.0000
Mean : 101.83 Mean :0.06252 Mean :0.5714
3rd Qu.: 126.00 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :5400.00 Max. :8.00000 Max. :5.0000
reservation_status reservation_status_date
Canceled :43017 2015-10-21: 1461
Check-Out:75166 2015-07-06: 805
No-Show : 1207 2016-11-25: 790
2015-01-01: 763
2016-01-18: 625
2015-07-02: 469
(Other) :114477
vars n mean sd median trimmed
hotel* 1 119390 1.34 0.47 1.00 1.29
is_canceled 2 119390 0.37 0.48 0.00 0.34
lead_time 3 119390 104.01 106.86 69.00 87.24
arrival_date_year 4 119390 2016.16 0.71 2016.00 2016.20
arrival_date_month* 5 119390 6.49 3.54 7.00 6.49
arrival_date_week_number 6 119390 27.17 13.61 28.00 27.16
arrival_date_day_of_month 7 119390 15.80 8.78 16.00 15.80
stays_in_weekend_nights 8 119390 0.93 1.00 1.00 0.84
stays_in_week_nights 9 119390 2.50 1.91 2.00 2.29
adults 10 119390 1.86 0.58 2.00 1.88
children 11 119386 0.10 0.40 0.00 0.00
babies 12 119390 0.01 0.10 0.00 0.00
meal* 13 119390 1.56 1.07 1.00 1.31
country* 14 119390 94.59 45.09 82.00 96.85
market_segment* 15 119390 5.93 1.27 6.00 6.11
distribution_channel* 16 119390 3.59 0.91 4.00 3.80
is_repeated_guest 17 119390 0.03 0.18 0.00 0.00
previous_cancellations 18 119390 0.09 0.84 0.00 0.00
previous_bookings_not_canceled 19 119390 0.14 1.50 0.00 0.00
reserved_room_type* 20 119390 1.99 1.70 1.00 1.64
assigned_room_type* 21 119390 2.33 1.88 1.00 2.02
booking_changes 22 119390 0.22 0.65 0.00 0.06
deposit_type* 23 119390 1.12 0.33 1.00 1.03
agent* 24 119390 211.82 122.75 296.00 221.86
company* 25 119390 342.22 50.06 353.00 353.00
days_in_waiting_list 26 119390 2.32 17.59 0.00 0.00
customer_type* 27 119390 3.14 0.58 3.00 3.14
adr 28 119390 101.83 50.54 94.58 97.71
required_car_parking_spaces 29 119390 0.06 0.25 0.00 0.00
total_of_special_requests 30 119390 0.57 0.79 0.00 0.43
reservation_status* 31 119390 1.65 0.50 2.00 1.67
reservation_status_date* 32 119390 517.75 227.48 525.00 521.18
"hotel"* 33 119390 1.00 0.00 1.00 1.00
mad min max range skew kurtosis se
hotel* 0.00 1.00 2 1.00 0.70 -1.51 0.00
is_canceled 0.00 0.00 1 1.00 0.54 -1.71 0.00
lead_time 88.96 0.00 737 737.00 1.35 1.70 0.31
arrival_date_year 1.48 2015.00 2017 2.00 -0.23 -0.99 0.00
arrival_date_month* 4.45 1.00 12 11.00 -0.04 -1.23 0.01
arrival_date_week_number 16.31 1.00 53 52.00 -0.01 -0.99 0.04
arrival_date_day_of_month 11.86 1.00 31 30.00 0.00 -1.19 0.03
stays_in_weekend_nights 1.48 0.00 19 19.00 1.38 7.17 0.00
stays_in_week_nights 1.48 0.00 50 50.00 2.86 24.28 0.01
adults 0.00 0.00 55 55.00 18.32 1352.04 0.00
children 0.00 0.00 10 10.00 4.11 18.67 0.00
babies 0.00 0.00 10 10.00 24.65 1633.85 0.00
meal* 0.00 1.00 5 4.00 1.59 0.96 0.00
country* 81.54 1.00 178 177.00 -0.16 -1.53 0.13
market_segment* 1.48 1.00 8 7.00 -1.00 0.15 0.00
distribution_channel* 0.00 1.00 5 4.00 -1.87 1.86 0.00
is_repeated_guest 0.00 0.00 1 1.00 5.33 26.37 0.00
previous_cancellations 0.00 0.00 26 26.00 24.46 674.03 0.00
previous_bookings_not_canceled 0.00 0.00 72 72.00 23.54 767.20 0.00
reserved_room_type* 0.00 1.00 10 9.00 1.44 0.89 0.00
assigned_room_type* 0.00 1.00 12 11.00 1.15 0.53 0.01
booking_changes 0.00 0.00 21 21.00 6.00 79.39 0.00
deposit_type* 0.00 1.00 3 2.00 2.38 4.07 0.00
agent* 56.34 1.00 334 333.00 -0.37 -1.53 0.36
company* 0.00 1.00 353 352.00 -4.93 23.77 0.14
days_in_waiting_list 0.00 0.00 391 391.00 11.94 186.78 0.05
customer_type* 0.00 1.00 4 3.00 -1.08 4.55 0.00
adr 41.25 -6.38 5400 5406.38 10.53 1013.13 0.15
required_car_parking_spaces 0.00 0.00 8 8.00 4.16 30.00 0.00
total_of_special_requests 0.00 0.00 5 5.00 1.35 1.49 0.00
reservation_status* 0.00 1.00 3 2.00 -0.38 -1.25 0.00
reservation_status_date* 275.76 1.00 926 925.00 -0.12 -1.02 0.66
"hotel"* 0.00 1.00 1 0.00 NaN NaN 0.00
hotel is_canceled lead_time arrival_date_year
City Hotel :79330 Min. :0.0000 Min. : 0 Min. :2015
Resort Hotel:40060 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
Median :0.0000 Median : 69 Median :2016
Mean :0.3704 Mean :104 Mean :2016
3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
Max. :1.0000 Max. :737 Max. :2017
arrival_date_month arrival_date_week_number arrival_date_day_of_month
August :13877 Min. : 1.00 Min. : 1.0
July :12661 1st Qu.:16.00 1st Qu.: 8.0
May :11791 Median :28.00 Median :16.0
October:11160 Mean :27.17 Mean :15.8
April :11089 3rd Qu.:38.00 3rd Qu.:23.0
June :10939 Max. :53.00 Max. :31.0
(Other):47873
stays_in_weekend_nights stays_in_week_nights adults
Min. : 0.0000 Min. : 0.0 Min. : 0.000
1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
Median : 1.0000 Median : 2.0 Median : 2.000
Mean : 0.9276 Mean : 2.5 Mean : 1.856
3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
Max. :19.0000 Max. :50.0 Max. :55.000
children babies meal country
Min. : 0.0000 Min. : 0.000000 BB :92310 PRT :48590
1st Qu.: 0.0000 1st Qu.: 0.000000 FB : 798 GBR :12129
Median : 0.0000 Median : 0.000000 HB :14463 FRA :10415
Mean : 0.1039 Mean : 0.007949 SC :10650 ESP : 8568
3rd Qu.: 0.0000 3rd Qu.: 0.000000 Undefined: 1169 DEU : 7287
Max. :10.0000 Max. :10.000000 ITA : 3766
NA's :4 (Other):28635
market_segment distribution_channel is_repeated_guest
Online TA :56477 Corporate: 6677 Min. :0.00000
Offline TA/TO:24219 Direct :14645 1st Qu.:0.00000
Groups :19811 GDS : 193 Median :0.00000
Direct :12606 TA/TO :97870 Mean :0.03191
Corporate : 5295 Undefined: 5 3rd Qu.:0.00000
Complementary: 743 Max. :1.00000
(Other) : 239
previous_cancellations previous_bookings_not_canceled reserved_room_type
Min. : 0.00000 Min. : 0.0000 A :85994
1st Qu.: 0.00000 1st Qu.: 0.0000 D :19201
Median : 0.00000 Median : 0.0000 E : 6535
Mean : 0.08712 Mean : 0.1371 F : 2897
3rd Qu.: 0.00000 3rd Qu.: 0.0000 G : 2094
Max. :26.00000 Max. :72.0000 B : 1118
(Other): 1551
assigned_room_type booking_changes deposit_type agent
A :74053 Min. : 0.0000 No Deposit:104641 9 :31961
D :25322 1st Qu.: 0.0000 Non Refund: 14587 NULL :16340
E : 7806 Median : 0.0000 Refundable: 162 240 :13922
F : 3751 Mean : 0.2211 1 : 7191
G : 2553 3rd Qu.: 0.0000 14 : 3640
C : 2375 Max. :21.0000 7 : 3539
(Other): 3530 (Other):42797
company days_in_waiting_list customer_type
NULL :112593 Min. : 0.000 Contract : 4076
40 : 927 1st Qu.: 0.000 Group : 577
223 : 784 Median : 0.000 Transient :89613
67 : 267 Mean : 2.321 Transient-Party:25124
45 : 250 3rd Qu.: 0.000
153 : 215 Max. :391.000
(Other): 4354
adr required_car_parking_spaces total_of_special_requests
Min. : -6.38 Min. :0.00000 Min. :0.0000
1st Qu.: 69.29 1st Qu.:0.00000 1st Qu.:0.0000
Median : 94.58 Median :0.00000 Median :0.0000
Mean : 101.83 Mean :0.06252 Mean :0.5714
3rd Qu.: 126.00 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :5400.00 Max. :8.00000 Max. :5.0000
reservation_status reservation_status_date "hotel"
Canceled :43017 2015-10-21: 1461 Length:119390
Check-Out:75166 2015-07-06: 805 Class :character
No-Show : 1207 2016-11-25: 790 Mode :character
2015-01-01: 763
2016-01-18: 625
2015-07-02: 469
(Other) :114477
Variable | Stats / Values | Freqs (% of Valid) | Graph | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
hotel [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is_canceled [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
lead_time [integer] |
|
479 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
arrival_date_year [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
arrival_date_month [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
arrival_date_week_number [integer] |
|
53 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
arrival_date_day_of_month [integer] |
|
31 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
stays_in_weekend_nights [integer] |
|
17 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
stays_in_week_nights [integer] |
|
35 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
adults [integer] |
|
14 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
children [integer] |
|
|
4 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
babies [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
meal [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
country [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
market_segment [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
distribution_channel [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
is_repeated_guest [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
previous_cancellations [integer] |
|
15 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
previous_bookings_not_canceled [integer] |
|
73 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
reserved_room_type [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
assigned_room_type [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
booking_changes [integer] |
|
21 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
deposit_type [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
agent [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
company [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
days_in_waiting_list [integer] |
|
128 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
customer_type [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
adr [numeric] |
|
8879 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
required_car_parking_spaces [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
total_of_special_requests [integer] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
reservation_status [factor] |
|
|
0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
reservation_status_date [factor] |
|
|
0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.2.1)
2022-09-04
Conduct some exploratory data analysis, using dplyr commands such as group_by()
, select()
, filter()
, and summarise()
. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
#select columns for factorization
cols_factor <- c("is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month', 'is_repeated_guest' )
# converting class of columns: "is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month' to factor 'is_repeated_guest'
hotel_bookings <-hotel_bookings%>%
mutate_at(vars(is_canceled, arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month, is_repeated_guest), funs(factor))
# verify class of hotel_bookings:
sapply(hotel_bookings, class)
hotel is_canceled
"factor" "factor"
lead_time arrival_date_year
"integer" "factor"
arrival_date_month arrival_date_week_number
"factor" "factor"
arrival_date_day_of_month stays_in_weekend_nights
"factor" "integer"
stays_in_week_nights adults
"integer" "integer"
children babies
"integer" "integer"
meal country
"factor" "factor"
market_segment distribution_channel
"factor" "factor"
is_repeated_guest previous_cancellations
"factor" "integer"
previous_bookings_not_canceled reserved_room_type
"integer" "factor"
assigned_room_type booking_changes
"factor" "integer"
deposit_type agent
"factor" "factor"
company days_in_waiting_list
"factor" "integer"
customer_type adr
"factor" "numeric"
required_car_parking_spaces total_of_special_requests
"integer" "integer"
reservation_status reservation_status_date
"factor" "factor"
# A tibble: 26 × 8
# Groups: arrival_date_year, arrival_date_month [26]
arrival_date_year arrival_d…¹ City …² City …³ City …⁴ Resor…⁵ Resor…⁶ Resor…⁷
<fct> <fct> <int> <int> <int> <int> <int> <int>
1 2015 August 1217 1248 15 361 1043 5
2 2015 December 646 986 22 278 961 27
3 2015 July 918 459 21 314 1058 6
4 2015 November 277 934 24 170 920 15
5 2015 October 1311 2065 10 403 1160 8
6 2015 September 1513 1986 30 543 1034 8
7 2016 April 1492 2022 47 515 1345 7
8 2016 August 1214 2131 33 572 1107 6
9 2016 December 1049 1406 23 316 1056 10
10 2016 February 796 1441 134 396 1113 11
# … with 16 more rows, and abbreviated variable names ¹arrival_date_month,
# ²`City Hotel_Canceled`, ³`City Hotel_Check-Out`, ⁴`City Hotel_No-Show`,
# ⁵`Resort Hotel_Canceled`, ⁶`Resort Hotel_Check-Out`,
# ⁷`Resort Hotel_No-Show`
# ℹ Use `print(n = ...)` to see more rows
booking_hymc<-hotel_bookings%>%
group_by(hotel, arrival_date_year, arrival_date_month)%>%
summarise(no_of_bookings=sum(!as.numeric(is_canceled)),no_of_cancelled=sum(as.numeric(is_canceled)), .groups = 'keep') %>%
pivot_wider(names_from = hotel, values_from = c(no_of_bookings, no_of_cancelled))
booking_hymc
# A tibble: 26 × 6
# Groups: arrival_date_year, arrival_date_month [26]
arrival_date_year arrival_date_month no_of_bookings…¹ no_of…² no_of…³ no_of…⁴
<fct> <fct> <int> <int> <dbl> <dbl>
1 2015 August 0 0 3712 1775
2 2015 December 0 0 2322 1571
3 2015 July 0 0 2337 1698
4 2015 November 0 0 1536 1290
5 2015 October 0 0 4707 1982
6 2015 September 0 0 5072 2136
7 2016 April 0 0 5100 2389
8 2016 August 0 0 4625 2263
9 2016 December 0 0 3550 1708
10 2016 February 0 0 3301 1927
# … with 16 more rows, and abbreviated variable names
# ¹`no_of_bookings_City Hotel`, ²`no_of_bookings_Resort Hotel`,
# ³`no_of_cancelled_City Hotel`, ⁴`no_of_cancelled_Resort Hotel`
# ℹ Use `print(n = ...)` to see more rows
# A tibble: 2 × 3
# Groups: reservation_status [2]
reservation_status `City Hotel` `Resort Hotel`
<fct> <int> <int>
1 Canceled 32186 10831
2 No-Show 916 291
booking_cym <-hotel_bookings%>%
filter(is_canceled==1, reservation_status %in% c('No-Show', 'Canceled'))%>%
group_by(hotel, arrival_date_year, arrival_date_month, reservation_status)%>%
summarise(cancelled_ticket_status=n(), .groups = 'keep') %>%
pivot_wider(names_from = hotel, values_from = c(cancelled_ticket_status))
booking_cym
# A tibble: 52 × 5
# Groups: arrival_date_year, arrival_date_month, reservation_status [52]
arrival_date_year arrival_date_month reservation_status `City Hotel` Resort…¹
<fct> <fct> <fct> <int> <int>
1 2015 August Canceled 1217 361
2 2015 August No-Show 15 5
3 2015 December Canceled 646 278
4 2015 December No-Show 22 27
5 2015 July Canceled 918 314
6 2015 July No-Show 21 6
7 2015 November Canceled 277 170
8 2015 November No-Show 24 15
9 2015 October Canceled 1311 403
10 2015 October No-Show 10 8
# … with 42 more rows, and abbreviated variable name ¹`Resort Hotel`
# ℹ Use `print(n = ...)` to see more rows
booking_adr <-hotel_bookings%>%
filter(is_canceled==0, reservation_status %in% c('No-Show', 'Check-Out'))%>%
group_by(hotel, arrival_date_year)%>%
summarise(median_adr=median(adr),avg_adr=mean(adr),sd_adr=sd(adr), min_adr = min(adr),max_adr = max(adr), quantile(adr, c(0.25, 0.5, 0.75)), .groups = 'keep')
booking_adr
# A tibble: 18 × 8
# Groups: hotel, arrival_date_year [6]
hotel arrival_date_year media…¹ avg_adr sd_adr min_adr max_adr quant…²
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 City Hotel 2015 85.5 87.9 36.1 0 268. 65
2 City Hotel 2015 85.5 87.9 36.1 0 268. 85.5
3 City Hotel 2015 85.5 87.9 36.1 0 268. 105.
4 City Hotel 2016 99 104. 37.7 0 452. 80.8
5 City Hotel 2016 99 104. 37.7 0 452. 99
6 City Hotel 2016 99 104. 37.7 0 452. 123.
7 City Hotel 2017 112 117. 43.1 0 510 89.1
8 City Hotel 2017 112 117. 43.1 0 510 112
9 City Hotel 2017 112 117. 43.1 0 510 140.
10 Resort Hotel 2015 75.5 89.8 53.7 0 508 48
11 Resort Hotel 2015 75.5 89.8 53.7 0 508 75.5
12 Resort Hotel 2015 75.5 89.8 53.7 0 508 124.
13 Resort Hotel 2016 66.3 83.9 55.9 0 367 48
14 Resort Hotel 2016 66.3 83.9 55.9 0 367 66.3
15 Resort Hotel 2016 66.3 83.9 55.9 0 367 100
16 Resort Hotel 2017 80.9 102. 66.0 -6.38 426. 53.0
17 Resort Hotel 2017 80.9 102. 66.0 -6.38 426. 80.9
18 Resort Hotel 2017 80.9 102. 66.0 -6.38 426. 135
# … with abbreviated variable names ¹median_adr,
# ²`quantile(adr, c(0.25, 0.5, 0.75))`
booking_canc_lt<- hotel_bookings%>%
select(hotel, is_canceled, lead_time)%>%
filter(is_canceled==1)%>%
group_by(hotel)%>%
summarise(max_leadtime_cancelled = max(lead_time), min_leadtime_cancelled= min(lead_time), sd_leadtime_cancelled = sd(lead_time), mean_lead_time_cancelled = mean(lead_time), .groups = 'keep')
booking_canc_lt
# A tibble: 2 × 5
# Groups: hotel [2]
hotel max_leadtime_cancelled min_leadtime_cancelled sd_leadti…¹ mean_…²
<fct> <int> <int> <dbl> <dbl>
1 City Hotel 629 0 124. 150.
2 Resort Hotel 471 0 98.8 129.
# … with abbreviated variable names ¹sd_leadtime_cancelled,
# ²mean_lead_time_cancelled
# A tibble: 2 × 5
# Groups: hotel [2]
hotel max_leadtime min_leadtime sd_leadtime mean_lead_time
<fct> <int> <int> <dbl> <dbl>
1 City Hotel 518 0 89.9 80.7
2 Resort Hotel 737 0 93.1 78.8
# A tibble: 13 × 3
# Groups: hotel, market_segment [13]
hotel market_segment repeat_guest
<fct> <fct> <int>
1 City Hotel Aviation 53
2 City Hotel Complementary 193
3 City Hotel Corporate 842
4 City Hotel Direct 158
5 City Hotel Groups 7
6 City Hotel Offline TA/TO 142
7 City Hotel Online TA 196
8 Resort Hotel Complementary 18
9 Resort Hotel Corporate 537
10 Resort Hotel Direct 581
11 Resort Hotel Groups 39
12 Resort Hotel Offline TA/TO 142
13 Resort Hotel Online TA 350
# A tibble: 12 × 3
# Groups: hotel, market_segment [12]
hotel market_segment repeat_guest_cancel
<fct> <fct> <int>
1 City Hotel Aviation 11
2 City Hotel Complementary 19
3 City Hotel Corporate 71
4 City Hotel Direct 33
5 City Hotel Groups 194
6 City Hotel Offline TA/TO 77
7 City Hotel Online TA 36
8 Resort Hotel Corporate 24
9 Resort Hotel Direct 24
10 Resort Hotel Groups 30
11 Resort Hotel Offline TA/TO 18
12 Resort Hotel Online TA 15
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.
---
title: "Challenge 2"
author: "Akhilesh Kumar Meghwal"
desription: "Data wrangling: using group() and summarise("
date: "08/21/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(Hmisc)
library(psych)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2) provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data
Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.
- railroad\*.csv or StateCounty2012.xlsx ⭐
- hotel_bookings ⭐⭐⭐
- FAOstat\*.csv ⭐⭐⭐⭐⭐ (join FAOSTAT_country_groups)
```{r}
#### Read dataset 'hotel_bookings.csv', available in the `posts/_data` folder, using the read_.csv command.
hotel_bookings <- read.csv('_data/hotel_bookings.csv', stringsAsFactors = TRUE, header = TRUE)
```
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
##### Check first 6 rows in the dataset, to get primary understanding of the dataframe structure
```{r}
#| label: summary
head(hotel_bookings)
```
###### View Dataframe in table view
```{r}
View(hotel_bookings)
```
##### Row and column dimentions of dataframe
```{r}
dim(hotel_bookings)
```
##### Column names of the dataframe
```{r}
colnames(hotel_bookings)
```
##### Column classes of the dataframe
```{r}
col_classes = data.frame(t(data.frame(lapply(hotel_bookings,class))))
```
##### Describe, Dataframe
```{r}
describe(hotel_bookings)
summary(hotel_bookings)
```
##### Summary of dataframe
```{r}
summary(hotel_bookings)
```
##### group_by summary hotel wise
```{r}
a <- hotel_bookings%>%
group_by('hotel')
describe(a)
summary(a)
```
##### Dataframe summary description using summarytools::dfSummary
```{r}
print(summarytools::dfSummary(hotel_bookings,
varnumbers = FALSE,
plain.ascii = FALSE,
style = "grid",
graph.magnif = 0.70,
valid.col = FALSE),
method = 'render',
table.classes = 'table-condensed')
```
##### Booking details for two hotels, namely "City Hotel & "Resort Hotel" is given in the dataset hotel_bookings
##### Hotel Bookings can be classified on following columns:
```{r}
table(hotel_bookings$arrival_date_month)
```
##### Hotel_bookings year wise
```{r}
table(hotel_bookings$arrival_date_year)
```
## Provide Grouped Summary Statistics
Conduct some exploratory data analysis, using dplyr commands such as `group_by()`, `select()`, `filter()`, and `summarise()`. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
##### Change class of certain columns to factors to represent categorical data
```{r}
#select columns for factorization
cols_factor <- c("is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month', 'is_repeated_guest' )
# converting class of columns: "is_canceled", 'arrival_date_year', 'arrival_date_month', 'arrival_date_week_number', 'arrival_date_day_of_month' to factor 'is_repeated_guest'
hotel_bookings <-hotel_bookings%>%
mutate_at(vars(is_canceled, arrival_date_year, arrival_date_month, arrival_date_week_number, arrival_date_day_of_month, is_repeated_guest), funs(factor))
# verify class of hotel_bookings:
sapply(hotel_bookings, class)
```
##### Hotel wise, year wise, month wise, reservation status wise number of bookings
```{r}
booking_hymrs<-hotel_bookings%>%
group_by(hotel, arrival_date_year, arrival_date_month, reservation_status)%>%
summarise(no_of_bookings=n(),.groups = 'keep') %>%
pivot_wider(names_from = c(hotel, reservation_status), values_from = no_of_bookings)
booking_hymrs
```
###### Hotel year wise, month wise, reservation status wise number of cancelled/normal bookings
###### is_cancelled: 1 = cancelled, 0= normal booking
```{r}
booking_hymc<-hotel_bookings%>%
group_by(hotel, arrival_date_year, arrival_date_month)%>%
summarise(no_of_bookings=sum(!as.numeric(is_canceled)),no_of_cancelled=sum(as.numeric(is_canceled)), .groups = 'keep') %>%
pivot_wider(names_from = hotel, values_from = c(no_of_bookings, no_of_cancelled))
booking_hymc
```
##### Hotel wise: Number of cancelled ticket: Canceled and No-Show
```{r}
booking_can <-hotel_bookings%>%
filter(is_canceled==1, reservation_status %in% c('No-Show', 'Canceled'))%>%
group_by(hotel,reservation_status)%>%
summarise(cancelled_ticket_status=n(), .groups = 'keep') %>%
pivot_wider(names_from = hotel, values_from = c(cancelled_ticket_status))
booking_can
```
##### Hotel year wise, month wise, reservation status wise number of cancelled/normal bookings
```{r}
booking_cym <-hotel_bookings%>%
filter(is_canceled==1, reservation_status %in% c('No-Show', 'Canceled'))%>%
group_by(hotel, arrival_date_year, arrival_date_month, reservation_status)%>%
summarise(cancelled_ticket_status=n(), .groups = 'keep') %>%
pivot_wider(names_from = hotel, values_from = c(cancelled_ticket_status))
booking_cym
```
##### Hotel year wise, month wise, country wise reservation status wise number of normal bookings/not-cancelled
##### To identify and undertand the hotel wise, year wise, month wise, country wise Average Daily Rate for the hotel for the bookings undertaken
##### Mean, median, standard deviation, minimum, maximum, quantile of adr for hotel > arrival_date_year
##### To identify the hotel wise, year wise, distribution of Average Daily Rate for the hotel for the bookings undertaken
```{r}
booking_adr <-hotel_bookings%>%
filter(is_canceled==0, reservation_status %in% c('No-Show', 'Check-Out'))%>%
group_by(hotel, arrival_date_year)%>%
summarise(median_adr=median(adr),avg_adr=mean(adr),sd_adr=sd(adr), min_adr = min(adr),max_adr = max(adr), quantile(adr, c(0.25, 0.5, 0.75)), .groups = 'keep')
booking_adr
```
##### Leadtime for cancelled booking
##### To understand if higher lead time is causing cancellation of booking
```{r}
booking_canc_lt<- hotel_bookings%>%
select(hotel, is_canceled, lead_time)%>%
filter(is_canceled==1)%>%
group_by(hotel)%>%
summarise(max_leadtime_cancelled = max(lead_time), min_leadtime_cancelled= min(lead_time), sd_leadtime_cancelled = sd(lead_time), mean_lead_time_cancelled = mean(lead_time), .groups = 'keep')
booking_canc_lt
```
##### Leadtime for booking undertaken
##### To understand if the lead time for bookings undertaken and compare the same with cancelled booking as calculated above
```{r}
booking_ut_lt<- hotel_bookings%>%
select(hotel, is_canceled, lead_time)%>%
filter(is_canceled==0)%>%
group_by(hotel)%>%
summarise(max_leadtime = max(lead_time), min_leadtime= min(lead_time), sd_leadtime = sd(lead_time), mean_lead_time = mean(lead_time), .groups = 'keep')
booking_ut_lt
```
##### Hotel wise, market segment wise repeated guest bookings undertaken
##### To identify any specific market segment is generating more number of bookings for the repeated guests.
```{r}
booking_rg<- hotel_bookings%>%
select(hotel, market_segment, is_repeated_guest, is_canceled)%>%
filter(is_repeated_guest==1, is_canceled==0)%>%
group_by(hotel,market_segment)%>%
summarise(repeat_guest=n(),.groups = 'keep')
booking_rg
```
##### Hotel wise, market segment wise repeated guest bookings cancelled
##### To identify any specific market segment is generating more number of cancelled bookings for the repeated guests.
```{r}
booking_rg_can<- hotel_bookings%>%
select(hotel, market_segment, is_repeated_guest, is_canceled)%>%
filter(is_repeated_guest==1, is_canceled==1)%>%
group_by(hotel,market_segment)%>%
summarise(repeat_guest_cancel=n(),.groups = 'keep')
booking_rg_can
```
### Explain and Interpret
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.