HW_2

Reading in Data

MANI KANTA GOGULA
2022-02-28

Getting the summary of the dataset

    hotel            is_canceled       lead_time   arrival_date_year
 Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
 Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
 Mode  :character   Median :0.0000   Median : 69   Median :2016     
                    Mean   :0.3704   Mean   :104   Mean   :2016     
                    3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                    Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                    
 arrival_date_month arrival_date_week_number
 Length:119390      Min.   : 1.00           
 Class :character   1st Qu.:16.00           
 Mode  :character   Median :28.00           
                    Mean   :27.17           
                    3rd Qu.:38.00           
                    Max.   :53.00           
                                            
 arrival_date_day_of_month stays_in_weekend_nights
 Min.   : 1.0              Min.   : 0.0000        
 1st Qu.: 8.0              1st Qu.: 0.0000        
 Median :16.0              Median : 1.0000        
 Mean   :15.8              Mean   : 0.9276        
 3rd Qu.:23.0              3rd Qu.: 2.0000        
 Max.   :31.0              Max.   :19.0000        
                                                  
 stays_in_week_nights     adults          children      
 Min.   : 0.0         Min.   : 0.000   Min.   : 0.0000  
 1st Qu.: 1.0         1st Qu.: 2.000   1st Qu.: 0.0000  
 Median : 2.0         Median : 2.000   Median : 0.0000  
 Mean   : 2.5         Mean   : 1.856   Mean   : 0.1039  
 3rd Qu.: 3.0         3rd Qu.: 2.000   3rd Qu.: 0.0000  
 Max.   :50.0         Max.   :55.000   Max.   :10.0000  
                                       NA's   :4        
     babies              meal             country         
 Min.   : 0.000000   Length:119390      Length:119390     
 1st Qu.: 0.000000   Class :character   Class :character  
 Median : 0.000000   Mode  :character   Mode  :character  
 Mean   : 0.007949                                        
 3rd Qu.: 0.000000                                        
 Max.   :10.000000                                        
                                                          
 market_segment     distribution_channel is_repeated_guest
 Length:119390      Length:119390        Min.   :0.00000  
 Class :character   Class :character     1st Qu.:0.00000  
 Mode  :character   Mode  :character     Median :0.00000  
                                         Mean   :0.03191  
                                         3rd Qu.:0.00000  
                                         Max.   :1.00000  
                                                          
 previous_cancellations previous_bookings_not_canceled
 Min.   : 0.00000       Min.   : 0.0000               
 1st Qu.: 0.00000       1st Qu.: 0.0000               
 Median : 0.00000       Median : 0.0000               
 Mean   : 0.08712       Mean   : 0.1371               
 3rd Qu.: 0.00000       3rd Qu.: 0.0000               
 Max.   :26.00000       Max.   :72.0000               
                                                      
 reserved_room_type assigned_room_type booking_changes  
 Length:119390      Length:119390      Min.   : 0.0000  
 Class :character   Class :character   1st Qu.: 0.0000  
 Mode  :character   Mode  :character   Median : 0.0000  
                                       Mean   : 0.2211  
                                       3rd Qu.: 0.0000  
                                       Max.   :21.0000  
                                                        
 deposit_type          agent             company         
 Length:119390      Length:119390      Length:119390     
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
                                                         
 days_in_waiting_list customer_type           adr         
 Min.   :  0.000      Length:119390      Min.   :  -6.38  
 1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
 Median :  0.000      Mode  :character   Median :  94.58  
 Mean   :  2.321                         Mean   : 101.83  
 3rd Qu.:  0.000                         3rd Qu.: 126.00  
 Max.   :391.000                         Max.   :5400.00  
                                                          
 required_car_parking_spaces total_of_special_requests
 Min.   :0.00000             Min.   :0.0000           
 1st Qu.:0.00000             1st Qu.:0.0000           
 Median :0.00000             Median :0.0000           
 Mean   :0.06252             Mean   :0.5714           
 3rd Qu.:0.00000             3rd Qu.:1.0000           
 Max.   :8.00000             Max.   :5.0000           
                                                      
 reservation_status reservation_status_date
 Length:119390      Length:119390          
 Class :character   Class :character       
 Mode  :character   Mode  :character       
                                           
                                           
                                           
                                           

Selecting the first , second and fourth columns from the dataset

# A tibble: 119,390 x 3
   hotel        is_canceled arrival_date_year
   <chr>              <dbl>             <dbl>
 1 Resort Hotel           0              2015
 2 Resort Hotel           0              2015
 3 Resort Hotel           0              2015
 4 Resort Hotel           0              2015
 5 Resort Hotel           0              2015
 6 Resort Hotel           0              2015
 7 Resort Hotel           0              2015
 8 Resort Hotel           0              2015
 9 Resort Hotel           1              2015
10 Resort Hotel           1              2015
# ... with 119,380 more rows

Filtering the dataset

# A tibble: 44,224 x 32
   hotel       is_canceled lead_time arrival_date_ye~ arrival_date_mo~
   <chr>             <dbl>     <dbl>            <dbl> <chr>           
 1 Resort Hot~           1        85             2015 July            
 2 Resort Hot~           1        75             2015 July            
 3 Resort Hot~           1        23             2015 July            
 4 Resort Hot~           1        60             2015 July            
 5 Resort Hot~           1        96             2015 July            
 6 Resort Hot~           1        45             2015 July            
 7 Resort Hot~           1        40             2015 July            
 8 Resort Hot~           1        43             2015 July            
 9 Resort Hot~           1        45             2015 July            
10 Resort Hot~           1        47             2015 July            
# ... with 44,214 more rows, and 27 more variables:
#   arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
#   stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
#   adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
#   country <chr>, market_segment <chr>, distribution_channel <chr>,
#   is_repeated_guest <dbl>, previous_cancellations <dbl>,
#   previous_bookings_not_canceled <dbl>, ...

Filtering the dataset with the lead_time equals to 60

# A tibble: 436 x 32
   hotel       is_canceled lead_time arrival_date_ye~ arrival_date_mo~
   <chr>             <dbl>     <dbl>            <dbl> <chr>           
 1 Resort Hot~           1        60             2015 July            
 2 Resort Hot~           0        60             2015 July            
 3 Resort Hot~           1        60             2015 July            
 4 Resort Hot~           1        60             2015 July            
 5 Resort Hot~           1        60             2015 July            
 6 Resort Hot~           1        60             2015 July            
 7 Resort Hot~           0        60             2015 July            
 8 Resort Hot~           0        60             2015 August          
 9 Resort Hot~           1        60             2015 August          
10 Resort Hot~           1        60             2015 August          
# ... with 426 more rows, and 27 more variables:
#   arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
#   stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
#   adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
#   country <chr>, market_segment <chr>, distribution_channel <chr>,
#   is_repeated_guest <dbl>, previous_cancellations <dbl>,
#   previous_bookings_not_canceled <dbl>, ...

Arranging the data in descending order of the arrival_date_year

# A tibble: 119,390 x 32
   hotel       is_canceled lead_time arrival_date_ye~ arrival_date_mo~
   <chr>             <dbl>     <dbl>            <dbl> <chr>           
 1 Resort Hot~           1        74             2017 January         
 2 Resort Hot~           1        62             2017 January         
 3 Resort Hot~           1        62             2017 January         
 4 Resort Hot~           1        62             2017 January         
 5 Resort Hot~           1        71             2017 January         
 6 Resort Hot~           1        88             2017 January         
 7 Resort Hot~           1       172             2017 January         
 8 Resort Hot~           1       168             2017 January         
 9 Resort Hot~           1        52             2017 January         
10 Resort Hot~           1        25             2017 January         
# ... with 119,380 more rows, and 27 more variables:
#   arrival_date_week_number <dbl>, arrival_date_day_of_month <dbl>,
#   stays_in_weekend_nights <dbl>, stays_in_week_nights <dbl>,
#   adults <dbl>, children <dbl>, babies <dbl>, meal <chr>,
#   country <chr>, market_segment <chr>, distribution_channel <chr>,
#   is_repeated_guest <dbl>, previous_cancellations <dbl>,
#   previous_bookings_not_canceled <dbl>, ...

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

GOGULA (2022, March 2). Data Analytics and Computational Social Science: HW_2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommanikanta870802/

BibTeX citation

@misc{gogula2022hw_2,
  author = {GOGULA, MANI KANTA},
  title = {Data Analytics and Computational Social Science: HW_2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommanikanta870802/},
  year = {2022}
}