Reading in Data
Getting the summary of the dataset
hotel is_canceled lead_time arrival_date_year
Length:119390 Min. :0.0000 Min. : 0 Min. :2015
Class :character 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
Mode :character Median :0.0000 Median : 69 Median :2016
Mean :0.3704 Mean :104 Mean :2016
3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
Max. :1.0000 Max. :737 Max. :2017
arrival_date_month arrival_date_week_number
Length:119390 Min. : 1.00
Class :character 1st Qu.:16.00
Mode :character Median :28.00
Mean :27.17
3rd Qu.:38.00
Max. :53.00
arrival_date_day_of_month stays_in_weekend_nights
Min. : 1.0 Min. : 0.0000
1st Qu.: 8.0 1st Qu.: 0.0000
Median :16.0 Median : 1.0000
Mean :15.8 Mean : 0.9276
3rd Qu.:23.0 3rd Qu.: 2.0000
Max. :31.0 Max. :19.0000
stays_in_week_nights adults children
Min. : 0.0 Min. : 0.000 Min. : 0.0000
1st Qu.: 1.0 1st Qu.: 2.000 1st Qu.: 0.0000
Median : 2.0 Median : 2.000 Median : 0.0000
Mean : 2.5 Mean : 1.856 Mean : 0.1039
3rd Qu.: 3.0 3rd Qu.: 2.000 3rd Qu.: 0.0000
Max. :50.0 Max. :55.000 Max. :10.0000
NA's :4
babies meal country
Min. : 0.000000 Length:119390 Length:119390
1st Qu.: 0.000000 Class :character Class :character
Median : 0.000000 Mode :character Mode :character
Mean : 0.007949
3rd Qu.: 0.000000
Max. :10.000000
market_segment distribution_channel is_repeated_guest
Length:119390 Length:119390 Min. :0.00000
Class :character Class :character 1st Qu.:0.00000
Mode :character Mode :character Median :0.00000
Mean :0.03191
3rd Qu.:0.00000
Max. :1.00000
previous_cancellations previous_bookings_not_canceled
Min. : 0.00000 Min. : 0.0000
1st Qu.: 0.00000 1st Qu.: 0.0000
Median : 0.00000 Median : 0.0000
Mean : 0.08712 Mean : 0.1371
3rd Qu.: 0.00000 3rd Qu.: 0.0000
Max. :26.00000 Max. :72.0000
reserved_room_type assigned_room_type booking_changes
Length:119390 Length:119390 Min. : 0.0000
Class :character Class :character 1st Qu.: 0.0000
Mode :character Mode :character Median : 0.0000
Mean : 0.2211
3rd Qu.: 0.0000
Max. :21.0000
deposit_type agent company
Length:119390 Length:119390 Length:119390
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
days_in_waiting_list customer_type adr
Min. : 0.000 Length:119390 Min. : -6.38
1st Qu.: 0.000 Class :character 1st Qu.: 69.29
Median : 0.000 Mode :character Median : 94.58
Mean : 2.321 Mean : 101.83
3rd Qu.: 0.000 3rd Qu.: 126.00
Max. :391.000 Max. :5400.00
required_car_parking_spaces total_of_special_requests
Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.0000
Mean :0.06252 Mean :0.5714
3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :8.00000 Max. :5.0000
reservation_status reservation_status_date
Length:119390 Length:119390
Class :character Class :character
Mode :character Mode :character
Selecting the first , second and fourth columns from the dataset
# A tibble: 119,390 x 3
hotel is_canceled arrival_date_year
<chr> <dbl> <dbl>
1 Resort Hotel 0 2015
2 Resort Hotel 0 2015
3 Resort Hotel 0 2015
4 Resort Hotel 0 2015
5 Resort Hotel 0 2015
6 Resort Hotel 0 2015
7 Resort Hotel 0 2015
8 Resort Hotel 0 2015
9 Resort Hotel 1 2015
10 Resort Hotel 1 2015
# ... with 119,380 more rows
Filtering the dataset
# A tibble: 44,224 x 32
hotel is_canceled lead_time arrival_date_ye~ arrival_date_mo~
<chr> <dbl> <dbl> <dbl> <chr>
1 Resort Hot~ 1 85 2015 July
2 Resort Hot~ 1 75 2015 July
3 Resort Hot~ 1 23 2015 July
4 Resort Hot~ 1 60 2015 July
5 Resort Hot~ 1 96 2015 July
6 Resort Hot~ 1 45 2015 July
7 Resort Hot~ 1 40 2015 July
8 Resort Hot~ 1 43 2015 July
9 Resort Hot~ 1 45 2015 July
10 Resort Hot~ 1 47 2015 July
# ... with 44,214 more rows, and 27 more variables:
# arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
# stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
# adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
# country <chr>, market_segment <chr>, distribution_channel <chr>,
# is_repeated_guest <dbl>, previous_cancellations <dbl>,
# previous_bookings_not_canceled <dbl>, ...
Filtering the dataset with the lead_time equals to 60
# A tibble: 436 x 32
hotel is_canceled lead_time arrival_date_ye~ arrival_date_mo~
<chr> <dbl> <dbl> <dbl> <chr>
1 Resort Hot~ 1 60 2015 July
2 Resort Hot~ 0 60 2015 July
3 Resort Hot~ 1 60 2015 July
4 Resort Hot~ 1 60 2015 July
5 Resort Hot~ 1 60 2015 July
6 Resort Hot~ 1 60 2015 July
7 Resort Hot~ 0 60 2015 July
8 Resort Hot~ 0 60 2015 August
9 Resort Hot~ 1 60 2015 August
10 Resort Hot~ 1 60 2015 August
# ... with 426 more rows, and 27 more variables:
# arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
# stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
# adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
# country <chr>, market_segment <chr>, distribution_channel <chr>,
# is_repeated_guest <dbl>, previous_cancellations <dbl>,
# previous_bookings_not_canceled <dbl>, ...
Arranging the data in descending order of the arrival_date_year
# A tibble: 119,390 x 32
hotel is_canceled lead_time arrival_date_ye~ arrival_date_mo~
<chr> <dbl> <dbl> <dbl> <chr>
1 Resort Hot~ 1 74 2017 January
2 Resort Hot~ 1 62 2017 January
3 Resort Hot~ 1 62 2017 January
4 Resort Hot~ 1 62 2017 January
5 Resort Hot~ 1 71 2017 January
6 Resort Hot~ 1 88 2017 January
7 Resort Hot~ 1 172 2017 January
8 Resort Hot~ 1 168 2017 January
9 Resort Hot~ 1 52 2017 January
10 Resort Hot~ 1 25 2017 January
# ... with 119,380 more rows, and 27 more variables:
# arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
# stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
# adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
# country <chr>, market_segment <chr>, distribution_channel <chr>,
# is_repeated_guest <dbl>, previous_cancellations <dbl>,
# previous_bookings_not_canceled <dbl>, ...
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
GOGULA (2022, March 2). Data Analytics and Computational Social Science: HW_2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommanikanta870802/
BibTeX citation
@misc{gogula2022hw_2, author = {GOGULA, MANI KANTA}, title = {Data Analytics and Computational Social Science: HW_2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommanikanta870802/}, year = {2022} }