Erin Liu HW3

HW3

Erin Liu
2022-01-03

Identify the dataset for the final project

I will be using the advanced dataset “hotel_bookings.csv” from the “Sample Datasets” section on Google Classroom for my final project. Identify the variables in the dataset

HW3_data<- read.csv('/Users/erinliu/Downloads/hotel_bookings.csv',TRUE,',')
dim(HW3_data)
[1] 119390     32
str(HW3_data)
'data.frame':   119390 obs. of  32 variables:
 $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "Resort Hotel" "Resort Hotel" ...
 $ is_canceled                   : int  0 0 0 0 0 0 0 0 1 1 ...
 $ lead_time                     : int  342 737 7 13 14 14 0 9 85 75 ...
 $ arrival_date_year             : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
 $ arrival_date_month            : chr  "July" "July" "July" "July" ...
 $ arrival_date_week_number      : int  27 27 27 27 27 27 27 27 27 27 ...
 $ arrival_date_day_of_month     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ stays_in_weekend_nights       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ stays_in_week_nights          : int  0 0 1 1 2 2 2 2 3 3 ...
 $ adults                        : int  2 2 1 1 2 2 2 2 2 2 ...
 $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ meal                          : chr  "BB" "BB" "BB" "BB" ...
 $ country                       : chr  "PRT" "PRT" "GBR" "GBR" ...
 $ market_segment                : chr  "Direct" "Direct" "Direct" "Corporate" ...
 $ distribution_channel          : chr  "Direct" "Direct" "Direct" "Corporate" ...
 $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
 $ reserved_room_type            : chr  "C" "C" "A" "A" ...
 $ assigned_room_type            : chr  "C" "C" "C" "A" ...
 $ booking_changes               : int  3 4 0 0 0 0 0 0 0 0 ...
 $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
 $ agent                         : chr  "NULL" "NULL" "NULL" "304" ...
 $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
 $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
 $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
 $ adr                           : num  0 0 75 75 98 ...
 $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
 $ reservation_status            : chr  "Check-Out" "Check-Out" "Check-Out" "Check-Out" ...
 $ reservation_status_date       : chr  "2015-07-01" "2015-07-01" "2015-07-02" "2015-07-02" ...

Read in/clean the dataset

Since I will be studying the influence of different factors on ADR, I will first need to clean the dataset to have all positive ADR. This will reduce the rows of data from 119,390 to 117,430. And also I want two separate tables for different hotel types. So resort_hotel_data will have size of 39,308 and city_hotel_data will have size of 78,122.

 resort_hotel_data <- filter(filter(select(HW3_data,everything()),adr>0), hotel=='Resort Hotel')
 city_hotel_data <- filter(filter(select(HW3_data,everything()),adr>0), hotel=='City Hotel')
 head(city_hotel_data)
       hotel is_canceled lead_time arrival_date_year
1 City Hotel           1        88              2015
2 City Hotel           1        65              2015
3 City Hotel           1        92              2015
4 City Hotel           1       100              2015
5 City Hotel           1        79              2015
6 City Hotel           0         3              2015
  arrival_date_month arrival_date_week_number
1               July                       27
2               July                       27
3               July                       27
4               July                       27
5               July                       27
6               July                       27
  arrival_date_day_of_month stays_in_weekend_nights
1                         1                       0
2                         1                       0
3                         1                       2
4                         2                       0
5                         2                       0
6                         2                       0
  stays_in_week_nights adults children babies meal country
1                    4      2        0      0   BB     PRT
2                    4      1        0      0   BB     PRT
3                    4      2        0      0   BB     PRT
4                    2      2        0      0   BB     PRT
5                    3      2        0      0   BB     PRT
6                    3      1        0      0   HB     PRT
  market_segment distribution_channel is_repeated_guest
1      Online TA                TA/TO                 0
2      Online TA                TA/TO                 0
3      Online TA                TA/TO                 0
4      Online TA                TA/TO                 0
5      Online TA                TA/TO                 0
6         Groups                TA/TO                 0
  previous_cancellations previous_bookings_not_canceled
1                      0                              0
2                      0                              0
3                      0                              0
4                      0                              0
5                      0                              0
6                      0                              0
  reserved_room_type assigned_room_type booking_changes deposit_type
1                  A                  A               0   No Deposit
2                  A                  A               0   No Deposit
3                  A                  A               0   No Deposit
4                  A                  A               0   No Deposit
5                  A                  A               0   No Deposit
6                  A                  A               1   No Deposit
  agent company days_in_waiting_list   customer_type   adr
1     9    NULL                    0       Transient 76.50
2     9    NULL                    0       Transient 68.00
3     9    NULL                    0       Transient 76.50
4     9    NULL                    0       Transient 76.50
5     9    NULL                    0       Transient 76.50
6     1    NULL                    0 Transient-Party 58.67
  required_car_parking_spaces total_of_special_requests
1                           0                         1
2                           0                         1
3                           0                         2
4                           0                         1
5                           0                         1
6                           0                         0
  reservation_status reservation_status_date
1           Canceled              2015-07-01
2           Canceled              2015-04-30
3           Canceled              2015-06-23
4           Canceled              2015-04-02
5           Canceled              2015-06-25
6          Check-Out              2015-07-05

Identify potential research questions that your dataset can help answer.

This dataset contains hotel booking information, especially for ADR (average daily rate) and other factors for two types of hotels: resort hotel or city hotel. This dataset can be used to identify how much can different factors influence the ADR.

For example, how can lead_time (Number of days that elapsed between the entering date of the booking into the PMS and the arrival date) affect the daily rate and what’s the best time to book a hotel ahead of time? What is the cheapest date to book a hotel in a month and how much will it vary from month to month?

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Liu (2022, Jan. 3). Data Analytics and Computational Social Science: Erin Liu HW3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomerinliuhw3/

BibTeX citation

@misc{liu2022erin,
  author = {Liu, Erin},
  title = {Data Analytics and Computational Social Science: Erin Liu HW3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomerinliuhw3/},
  year = {2022}
}