sathvik_thogaru_homework4

hotel_bookings Dataset

sathvik_thogaru
08-18-2021

Importing data

This data set contains a single file which compares various booking information between hotels.

importing the data and reading the top 5 rows

# A tibble: 6 × 32
  hotel        is_canceled lead_time arrival_date_ye… arrival_date_mo…
  <chr>              <dbl>     <dbl>            <dbl> <chr>           
1 Resort Hotel           0       342             2015 July            
2 Resort Hotel           0       737             2015 July            
3 Resort Hotel           0         7             2015 July            
4 Resort Hotel           0        13             2015 July            
5 Resort Hotel           0        14             2015 July            
6 Resort Hotel           0        14             2015 July            
# … with 27 more variables: arrival_date_week_number <dbl>,
#   arrival_date_day_of_month <dbl>, stays_in_weekend_nights <dbl>,
#   stays_in_week_nights <dbl>, adults <dbl>, children <dbl>,
#   babies <dbl>, meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>,
#   previous_bookings_not_canceled <dbl>, reserved_room_type <chr>, …

skim() is used to for getting summary statistics about variables in dataframe,tibbles,datatablesand vectors. It is mostly used with grouped dataframes (source: https://cran.r-project.org/web/packages/skimr/vignettes/skimr.html)

Table 1: Data summary
Name hotel_bookings
Number of rows 119390
Number of columns 32
_______________________
Column type frequency:
character 13
Date 1
numeric 18
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
hotel 0 1 10 12 0 2 0
arrival_date_month 0 1 3 9 0 12 0
meal 0 1 2 9 0 5 0
country 0 1 2 4 0 178 0
market_segment 0 1 6 13 0 8 0
distribution_channel 0 1 3 9 0 5 0
reserved_room_type 0 1 1 1 0 10 0
assigned_room_type 0 1 1 1 0 12 0
deposit_type 0 1 10 10 0 3 0
agent 0 1 1 4 0 334 0
company 0 1 1 4 0 353 0
customer_type 0 1 5 15 0 4 0
reservation_status 0 1 7 9 0 3 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
reservation_status_date 0 1 2014-10-17 2017-09-14 2016-08-07 926

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
is_canceled 0 1 0.37 0.48 0.00 0.00 0.00 1 1 ▇▁▁▁▅
lead_time 0 1 104.01 106.86 0.00 18.00 69.00 160 737 ▇▂▁▁▁
arrival_date_year 0 1 2016.16 0.71 2015.00 2016.00 2016.00 2017 2017 ▃▁▇▁▆
arrival_date_week_number 0 1 27.17 13.61 1.00 16.00 28.00 38 53 ▅▇▇▇▅
arrival_date_day_of_month 0 1 15.80 8.78 1.00 8.00 16.00 23 31 ▇▇▇▇▆
stays_in_weekend_nights 0 1 0.93 1.00 0.00 0.00 1.00 2 19 ▇▁▁▁▁
stays_in_week_nights 0 1 2.50 1.91 0.00 1.00 2.00 3 50 ▇▁▁▁▁
adults 0 1 1.86 0.58 0.00 2.00 2.00 2 55 ▇▁▁▁▁
children 4 1 0.10 0.40 0.00 0.00 0.00 0 10 ▇▁▁▁▁
babies 0 1 0.01 0.10 0.00 0.00 0.00 0 10 ▇▁▁▁▁
is_repeated_guest 0 1 0.03 0.18 0.00 0.00 0.00 0 1 ▇▁▁▁▁
previous_cancellations 0 1 0.09 0.84 0.00 0.00 0.00 0 26 ▇▁▁▁▁
previous_bookings_not_canceled 0 1 0.14 1.50 0.00 0.00 0.00 0 72 ▇▁▁▁▁
booking_changes 0 1 0.22 0.65 0.00 0.00 0.00 0 21 ▇▁▁▁▁
days_in_waiting_list 0 1 2.32 17.59 0.00 0.00 0.00 0 391 ▇▁▁▁▁
adr 0 1 101.83 50.54 -6.38 69.29 94.58 126 5400 ▇▁▁▁▁
required_car_parking_spaces 0 1 0.06 0.25 0.00 0.00 0.00 0 8 ▇▁▁▁▁
total_of_special_requests 0 1 0.57 0.79 0.00 0.00 0.00 1 5 ▇▁▁▁▁

from the above summary statistics we can see there are a total of 119390 rows and 32 columns in the hotel_bookings dataset. 13 character variables, 18 numeric variables, and 1 date variable. there are a total of 4 missing values in the children variable. for the analysis now i will be using hotel, market segment, stays_in_weekend_nights and stays_in_week_nights.

Varaible Description

hotel variable: type of hotel booked
market segment : Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”
stays_in_weekend_nights : guest stayed at the hotel in weekend nights
stays_in_week_nights : guest stayed at the hotel in week nights
I am using the select() from the dplyr package which comes with tidyverse package and the piping for selecting columns

# A tibble: 119,390 × 4
   hotel        stays_in_weekend_ni… stays_in_week_nig… market_segment
   <chr>                       <dbl>              <dbl> <chr>         
 1 Resort Hotel                    0                  0 Direct        
 2 Resort Hotel                    0                  0 Direct        
 3 Resort Hotel                    0                  1 Direct        
 4 Resort Hotel                    0                  1 Corporate     
 5 Resort Hotel                    0                  2 Online TA     
 6 Resort Hotel                    0                  2 Online TA     
 7 Resort Hotel                    0                  2 Direct        
 8 Resort Hotel                    0                  2 Direct        
 9 Resort Hotel                    0                  3 Online TA     
10 Resort Hotel                    0                  3 Offline TA/TO 
# … with 119,380 more rows
[1] "Resort Hotel" "City Hotel"  
[1] "Direct"        "Corporate"     "Online TA"     "Offline TA/TO"
[5] "Complementary" "Groups"        "Undefined"     "Aviation"     

bookings in different market segments

city hotel

# A tibble: 8 × 2
  market_segment     n
  <chr>          <int>
1 Aviation         237
2 Complementary    542
3 Corporate       2986
4 Direct          6093
5 Groups         13975
6 Offline TA/TO  16747
7 Online TA      38748
8 Undefined          2

resort hotel

# A tibble: 6 × 2
  market_segment     n
  <chr>          <int>
1 Complementary    201
2 Corporate       2309
3 Direct          6513
4 Groups          5836
5 Offline TA/TO   7472
6 Online TA      17729

# A tibble: 14 × 3
# Groups:   hotel, market_segment [14]
   hotel        market_segment     n
   <chr>        <chr>          <int>
 1 City Hotel   Aviation         237
 2 City Hotel   Complementary    542
 3 City Hotel   Corporate       2986
 4 City Hotel   Direct          6093
 5 City Hotel   Groups         13975
 6 City Hotel   Offline TA/TO  16747
 7 City Hotel   Online TA      38748
 8 City Hotel   Undefined          2
 9 Resort Hotel Complementary    201
10 Resort Hotel Corporate       2309
11 Resort Hotel Direct          6513
12 Resort Hotel Groups          5836
13 Resort Hotel Offline TA/TO   7472
14 Resort Hotel Online TA      17729

how many number of days do people stay in the hotel?

Resort hotel

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

sathvik_thogaru (2021, Aug. 18). DACSS 601 August 2021: sathvik_thogaru_homework4. Retrieved from https://mrolfe.github.io/DACSS601August2021/posts/2021-08-18-sathvikthogaruhomework4/

BibTeX citation

@misc{sathvik_thogaru2021sathvik_thogaru_homework4,
  author = {sathvik_thogaru, },
  title = {DACSS 601 August 2021: sathvik_thogaru_homework4},
  url = {https://mrolfe.github.io/DACSS601August2021/posts/2021-08-18-sathvikthogaruhomework4/},
  year = {2021}
}