DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 2

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Describe the data
  • Provide Grouped Summary Statistics
    • Explain and Interpret

Challenge 2

  • Show All Code
  • Hide All Code

  • View Source
challenge_2
railroads
faostat
hotel_bookings
Author

Saisrinivas Ambatipudi

Published

October 16, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
  2. provide summary statistics for different interesting groups within the data, and interpret those statistics

Read in the Data

Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.

  • railroad*.csv or StateCounty2012.xls ⭐
  • FAOstat*.csv or birds.csv ⭐⭐⭐
  • hotel_bookings.csv ⭐⭐⭐⭐
Code
hotel1 <- read_csv("_data/hotel_bookings.csv")
hotel1
# A tibble: 119,390 × 32
   hotel  is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
   <chr>    <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
 1 Resor…       0     342    2015 July         27       1       0       0      2
 2 Resor…       0     737    2015 July         27       1       0       0      2
 3 Resor…       0       7    2015 July         27       1       0       1      1
 4 Resor…       0      13    2015 July         27       1       0       1      1
 5 Resor…       0      14    2015 July         27       1       0       2      2
 6 Resor…       0      14    2015 July         27       1       0       2      2
 7 Resor…       0       0    2015 July         27       1       0       2      2
 8 Resor…       0       9    2015 July         27       1       0       2      2
 9 Resor…       1      85    2015 July         27       1       0       3      2
10 Resor…       1      75    2015 July         27       1       0       3      2
# … with 119,380 more rows, 22 more variables: children <dbl>, babies <dbl>,
#   meal <chr>, country <chr>, market_segment <chr>,
#   distribution_channel <chr>, is_repeated_guest <dbl>,
#   previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
#   reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
#   deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
#   customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …

Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

After using the read_csv function we can see that the data has 119320 rows and 32 columns. After using the summary function we can see that: - The dates span from 2015 to 2017 for this data set. - In one booking 55 adults booked a room when we check the max of the adults column. - In one booking 10 children and in another booking 10 babies based on max values of the children and the babies column respectively. - A booking was on waiting list for a maximum of 391 days!! - The maximum number of 5 special requests was made on one booking. - The maximum number of 8 parking spaces were requested on one booking, looks like it was the time 55 people came to the place.XD

Code
summary(hotel1)
    hotel            is_canceled       lead_time   arrival_date_year
 Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
 Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
 Mode  :character   Median :0.0000   Median : 69   Median :2016     
                    Mean   :0.3704   Mean   :104   Mean   :2016     
                    3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
                    Max.   :1.0000   Max.   :737   Max.   :2017     
                                                                    
 arrival_date_month arrival_date_week_number arrival_date_day_of_month
 Length:119390      Min.   : 1.00            Min.   : 1.0             
 Class :character   1st Qu.:16.00            1st Qu.: 8.0             
 Mode  :character   Median :28.00            Median :16.0             
                    Mean   :27.17            Mean   :15.8             
                    3rd Qu.:38.00            3rd Qu.:23.0             
                    Max.   :53.00            Max.   :31.0             
                                                                      
 stays_in_weekend_nights stays_in_week_nights     adults      
 Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
 1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
 Median : 1.0000         Median : 2.0         Median : 2.000  
 Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
 3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
 Max.   :19.0000         Max.   :50.0         Max.   :55.000  
                                                              
    children           babies              meal             country         
 Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
 1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
 Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
 Mean   : 0.1039   Mean   : 0.007949                                        
 3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
 Max.   :10.0000   Max.   :10.000000                                        
 NA's   :4                                                                  
 market_segment     distribution_channel is_repeated_guest
 Length:119390      Length:119390        Min.   :0.00000  
 Class :character   Class :character     1st Qu.:0.00000  
 Mode  :character   Mode  :character     Median :0.00000  
                                         Mean   :0.03191  
                                         3rd Qu.:0.00000  
                                         Max.   :1.00000  
                                                          
 previous_cancellations previous_bookings_not_canceled reserved_room_type
 Min.   : 0.00000       Min.   : 0.0000                Length:119390     
 1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
 Median : 0.00000       Median : 0.0000                Mode  :character  
 Mean   : 0.08712       Mean   : 0.1371                                  
 3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
 Max.   :26.00000       Max.   :72.0000                                  
                                                                         
 assigned_room_type booking_changes   deposit_type          agent          
 Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
 Class :character   1st Qu.: 0.0000   Class :character   Class :character  
 Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
                    Mean   : 0.2211                                        
                    3rd Qu.: 0.0000                                        
                    Max.   :21.0000                                        
                                                                           
   company          days_in_waiting_list customer_type           adr         
 Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
 Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
 Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
                    Mean   :  2.321                         Mean   : 101.83  
                    3rd Qu.:  0.000                         3rd Qu.: 126.00  
                    Max.   :391.000                         Max.   :5400.00  
                                                                             
 required_car_parking_spaces total_of_special_requests reservation_status
 Min.   :0.00000             Min.   :0.0000            Length:119390     
 1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
 Median :0.00000             Median :0.0000            Mode  :character  
 Mean   :0.06252             Mean   :0.5714                              
 3rd Qu.:0.00000             3rd Qu.:1.0000                              
 Max.   :8.00000             Max.   :5.0000                              
                                                                         
 reservation_status_date
 Min.   :2014-10-17     
 1st Qu.:2016-02-01     
 Median :2016-08-07     
 Mean   :2016-07-30     
 3rd Qu.:2017-02-08     
 Max.   :2017-09-14     
                        

Provide Grouped Summary Statistics

Conduct some exploratory data analysis, using dplyr commands such as group_by(), select(), filter(), and summarise(). Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.

Code
hotel2 <- hotel1 %>% group_by(hotel, arrival_date_month, arrival_date_year)
hotel3 <- hotel2 %>% summarise(Babies = sum(babies), Children = sum(children, na.rm = TRUE), Adults = sum(adults))
hotel3
# A tibble: 52 × 6
# Groups:   hotel, arrival_date_month [24]
   hotel      arrival_date_month arrival_date_year Babies Children Adults
   <chr>      <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
 1 City Hotel April                           2016     11      339   6671
 2 City Hotel April                           2017     13      448   7619
 3 City Hotel August                          2015     16      165   4697
 4 City Hotel August                          2016     26      730   6904
 5 City Hotel August                          2017     19      604   6240
 6 City Hotel December                        2015     23      120   2907
 7 City Hotel December                        2016     15      378   4702
 8 City Hotel February                        2016     13      194   4124
 9 City Hotel February                        2017     18      305   4722
10 City Hotel January                         2016     16       69   2200
# … with 42 more rows
Code
x <- filter(hotel3, arrival_date_year == 2015)
y <- filter(hotel3, arrival_date_year == 2016)
z <- filter(hotel3, arrival_date_year == 2017)


x1 <- filter(x, hotel == "City Hotel")
x1
# A tibble: 6 × 6
# Groups:   hotel, arrival_date_month [6]
  hotel      arrival_date_month arrival_date_year Babies Children Adults
  <chr>      <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
1 City Hotel August                          2015     16      165   4697
2 City Hotel December                        2015     23      120   2907
3 City Hotel July                            2015      2       14   2671
4 City Hotel November                        2015      8       25   1918
5 City Hotel October                         2015     21       98   5910
6 City Hotel September                       2015     14       90   6282
Code
y1 <- filter(y, hotel == "City Hotel")
y1
# A tibble: 12 × 6
# Groups:   hotel, arrival_date_month [12]
   hotel      arrival_date_month arrival_date_year Babies Children Adults
   <chr>      <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
 1 City Hotel April                           2016     11      339   6671
 2 City Hotel August                          2016     26      730   6904
 3 City Hotel December                        2016     15      378   4702
 4 City Hotel February                        2016     13      194   4124
 5 City Hotel January                         2016     16       69   2200
 6 City Hotel July                            2016     21      605   6231
 7 City Hotel June                            2016     17      222   7012
 8 City Hotel March                           2016     15      279   5599
 9 City Hotel May                             2016      7      254   6671
10 City Hotel November                        2016     14      147   5412
11 City Hotel October                         2016     13      336   7755
12 City Hotel September                       2016     27      261   7287
Code
z1 <- filter(z, hotel == "City Hotel")
z1
# A tibble: 8 × 6
# Groups:   hotel, arrival_date_month [8]
  hotel      arrival_date_month arrival_date_year Babies Children Adults
  <chr>      <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
1 City Hotel April                           2017     13      448   7619
2 City Hotel August                          2017     19      604   6240
3 City Hotel February                        2017     18      305   4722
4 City Hotel January                         2017     14      244   4178
5 City Hotel July                            2017     12      574   7073
6 City Hotel June                            2017     11      347   7507
7 City Hotel March                           2017     16      165   6259
8 City Hotel May                             2017     10      235   8287
Code
x2 <- filter(x, hotel == "Resort Hotel")
x2
# A tibble: 6 × 6
# Groups:   hotel, arrival_date_month [6]
  hotel        arrival_date_month arrival_date_year Babies Children Adults
  <chr>        <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
1 Resort Hotel August                          2015     51      263   2848
2 Resort Hotel December                        2015     23       96   2301
3 Resort Hotel July                            2015     24      252   2752
4 Resort Hotel November                        2015     13       47   1796
5 Resort Hotel October                         2015     11       94   2950
6 Resort Hotel September                       2015     17       93   3230
Code
y2 <- filter(y, hotel == "Resort Hotel")
y2
# A tibble: 12 × 6
# Groups:   hotel, arrival_date_month [12]
   hotel        arrival_date_month arrival_date_year Babies Children Adults
   <chr>        <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
 1 Resort Hotel April                           2016     13      114   3323
 2 Resort Hotel August                          2016     55      438   3447
 3 Resort Hotel December                        2016     17      142   2472
 4 Resort Hotel February                        2016     14      128   2742
 5 Resort Hotel January                         2016      7       47   1496
 6 Resort Hotel July                            2016     38      319   2887
 7 Resort Hotel June                            2016     24      201   2625
 8 Resort Hotel March                           2016     19      164   3185
 9 Resort Hotel May                             2016     25      152   3311
10 Resort Hotel November                        2016     10       60   2362
11 Resort Hotel October                         2016     17      175   3664
12 Resort Hotel September                       2016     29      154   2882
Code
z2 <- filter(z, hotel == "Resort Hotel")
z2
# A tibble: 8 × 6
# Groups:   hotel, arrival_date_month [8]
  hotel        arrival_date_month arrival_date_year Babies Children Adults
  <chr>        <chr>                          <dbl>  <dbl>    <dbl>  <dbl>
1 Resort Hotel April                           2017     16      240   3193
2 Resort Hotel August                          2017     29      580   3659
3 Resort Hotel February                        2017     14      163   2862
4 Resort Hotel January                         2017     11       92   2150
5 Resort Hotel July                            2017     24      558   3550
6 Resort Hotel June                            2017     19      287   3209
7 Resort Hotel March                           2017      7       92   2632
8 Resort Hotel May                             2017     30      204   3270
Code
summary(x)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:12          Length:12          Min.   :2015      Min.   : 2.00  
 Class :character   Class :character   1st Qu.:2015      1st Qu.:12.50  
 Mode  :character   Mode  :character   Median :2015      Median :16.50  
                                       Mean   :2015      Mean   :18.58  
                                       3rd Qu.:2015      3rd Qu.:23.00  
                                       Max.   :2015      Max.   :51.00  
    Children          Adults    
 Min.   : 14.00   Min.   :1796  
 1st Qu.: 79.25   1st Qu.:2578  
 Median : 95.00   Median :2878  
 Mean   :113.08   Mean   :3355  
 3rd Qu.:131.25   3rd Qu.:3597  
 Max.   :263.00   Max.   :6282  
Code
summary(y)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:24          Length:24          Min.   :2016      Min.   : 7.00  
 Class :character   Class :character   1st Qu.:2016      1st Qu.:13.00  
 Mode  :character   Mode  :character   Median :2016      Median :16.50  
                                       Mean   :2016      Mean   :19.29  
                                       3rd Qu.:2016      3rd Qu.:24.25  
                                       Max.   :2016      Max.   :55.00  
    Children         Adults    
 Min.   : 47.0   Min.   :1496  
 1st Qu.:145.8   1st Qu.:2847  
 Median :197.5   Median :3556  
 Mean   :246.2   Mean   :4374  
 3rd Qu.:323.2   3rd Qu.:6341  
 Max.   :730.0   Max.   :7755  
Code
summary(z)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:16          Length:16          Min.   :2017      Min.   : 7.00  
 Class :character   Class :character   1st Qu.:2017      1st Qu.:11.75  
 Mode  :character   Mode  :character   Median :2017      Median :15.00  
                                       Mean   :2017      Mean   :16.44  
                                       3rd Qu.:2017      3rd Qu.:19.00  
                                       Max.   :2017      Max.   :30.00  
    Children         Adults    
 Min.   : 92.0   Min.   :2150  
 1st Qu.:194.2   1st Qu.:3205  
 Median :265.5   Median :3918  
 Mean   :321.1   Mean   :4776  
 3rd Qu.:475.5   3rd Qu.:6462  
 Max.   :604.0   Max.   :8287  
Code
summary(x1)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:6           Length:6           Min.   :2015      Min.   : 2.00  
 Class :character   Class :character   1st Qu.:2015      1st Qu.: 9.50  
 Mode  :character   Mode  :character   Median :2015      Median :15.00  
                                       Mean   :2015      Mean   :14.00  
                                       3rd Qu.:2015      3rd Qu.:19.75  
                                       Max.   :2015      Max.   :23.00  
    Children          Adults    
 Min.   : 14.00   Min.   :1918  
 1st Qu.: 41.25   1st Qu.:2730  
 Median : 94.00   Median :3802  
 Mean   : 85.33   Mean   :4064  
 3rd Qu.:114.50   3rd Qu.:5607  
 Max.   :165.00   Max.   :6282  
Code
summary(y1)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:12          Length:12          Min.   :2016      Min.   : 7.00  
 Class :character   Class :character   1st Qu.:2016      1st Qu.:13.00  
 Mode  :character   Mode  :character   Median :2016      Median :15.00  
                                       Mean   :2016      Mean   :16.25  
                                       3rd Qu.:2016      3rd Qu.:18.00  
                                       Max.   :2016      Max.   :27.00  
    Children         Adults    
 Min.   : 69.0   Min.   :2200  
 1st Qu.:215.0   1st Qu.:5234  
 Median :270.0   Median :6451  
 Mean   :317.8   Mean   :5881  
 3rd Qu.:348.8   3rd Qu.:6931  
 Max.   :730.0   Max.   :7755  
Code
summary(z1)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:8           Length:8           Min.   :2017      Min.   :10.00  
 Class :character   Class :character   1st Qu.:2017      1st Qu.:11.75  
 Mode  :character   Mode  :character   Median :2017      Median :13.50  
                                       Mean   :2017      Mean   :14.12  
                                       3rd Qu.:2017      3rd Qu.:16.50  
                                       Max.   :2017      Max.   :19.00  
    Children         Adults    
 Min.   :165.0   Min.   :4178  
 1st Qu.:241.8   1st Qu.:5860  
 Median :326.0   Median :6666  
 Mean   :365.2   Mean   :6486  
 3rd Qu.:479.5   3rd Qu.:7535  
 Max.   :604.0   Max.   :8287  
Code
summary(x2)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:6           Length:6           Min.   :2015      Min.   :11.00  
 Class :character   Class :character   1st Qu.:2015      1st Qu.:14.00  
 Mode  :character   Mode  :character   Median :2015      Median :20.00  
                                       Mean   :2015      Mean   :23.17  
                                       3rd Qu.:2015      3rd Qu.:23.75  
                                       Max.   :2015      Max.   :51.00  
    Children          Adults    
 Min.   : 47.00   Min.   :1796  
 1st Qu.: 93.25   1st Qu.:2414  
 Median : 95.00   Median :2800  
 Mean   :140.83   Mean   :2646  
 3rd Qu.:213.00   3rd Qu.:2924  
 Max.   :263.00   Max.   :3230  
Code
summary(y2)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:12          Length:12          Min.   :2016      Min.   : 7.00  
 Class :character   Class :character   1st Qu.:2016      1st Qu.:13.75  
 Mode  :character   Mode  :character   Median :2016      Median :18.00  
                                       Mean   :2016      Mean   :22.33  
                                       3rd Qu.:2016      3rd Qu.:26.00  
                                       Max.   :2016      Max.   :55.00  
    Children         Adults    
 Min.   : 47.0   Min.   :1496  
 1st Qu.:124.5   1st Qu.:2587  
 Median :153.0   Median :2884  
 Mean   :174.5   Mean   :2866  
 3rd Qu.:181.5   3rd Qu.:3314  
 Max.   :438.0   Max.   :3664  
Code
summary(z2)
    hotel           arrival_date_month arrival_date_year     Babies     
 Length:8           Length:8           Min.   :2017      Min.   : 7.00  
 Class :character   Class :character   1st Qu.:2017      1st Qu.:13.25  
 Mode  :character   Mode  :character   Median :2017      Median :17.50  
                                       Mean   :2017      Mean   :18.75  
                                       3rd Qu.:2017      3rd Qu.:25.25  
                                       Max.   :2017      Max.   :30.00  
    Children         Adults    
 Min.   : 92.0   Min.   :2150  
 1st Qu.:145.2   1st Qu.:2804  
 Median :222.0   Median :3201  
 Mean   :277.0   Mean   :3066  
 3rd Qu.:354.8   3rd Qu.:3340  
 Max.   :580.0   Max.   :3659  
Code
sum(hotel3$Babies, na.rm=TRUE)
[1] 949
Code
sum(hotel3$Children, na.rm=TRUE)
[1] 12403
Code
sum(hotel3$Adults, na.rm=TRUE)
[1] 221636
Code
sum(x1$Babies, na.rm = TRUE)
[1] 84
Code
sum(x1$Children, na.rm = TRUE)
[1] 512
Code
sum(x1$Adults, na.rm = TRUE)
[1] 24385
Code
sum(y1$Babies, na.rm = TRUE)
[1] 195
Code
sum(y1$Children, na.rm = TRUE)
[1] 3814
Code
sum(y1$Adults, na.rm = TRUE)
[1] 70568
Code
sum(z1$Babies, na.rm = TRUE)
[1] 113
Code
sum(z1$Children, na.rm = TRUE)
[1] 2922
Code
sum(z1$Adults, na.rm = TRUE)
[1] 51885
Code
sum(x2$Babies, na.rm = TRUE)
[1] 139
Code
sum(x2$Children, na.rm = TRUE)
[1] 845
Code
sum(x2$Adults, na.rm = TRUE)
[1] 15877
Code
sum(y2$Babies, na.rm = TRUE)
[1] 268
Code
sum(y2$Children, na.rm = TRUE)
[1] 2094
Code
sum(y2$Adults, na.rm = TRUE)
[1] 34396
Code
sum(z2$Babies, na.rm = TRUE)
[1] 150
Code
sum(z2$Children, na.rm = TRUE)
[1] 2216
Code
sum(z2$Adults, na.rm = TRUE)
[1] 24525

Explain and Interpret

Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.

So the data analysis that I have done is in regard to how many babies, children and adults stayed in each hotel for the three different years which are city hotel and resort hotel for the years 2015, 2016 and 2017 respectively.

From the analysis we can see that for city hotel the analysis is:

Babies: 2015 ==> 84, 2016 ==> 195, 2017 ==> 113
Children: 2015 ==> 512, 2016 ==> 3814, 2017 ==> 2922
Adults: 2015 ==> 24385, 2016 ==> 70568, 2017 ==> 51885

For the resort hotel it is:

Babies: 2015 ==> 139, 2016 ==> 845, 2017 ==> 15877
Children: 2015 ==> 268, 2016 ==> 2094, 2017 ==> 34396
Adults: 2015 ==> 150, 2016 ==> 2216, 2017 ==> 24525

When we take the analysis for both hotels together, the total number of babies, children and adults are 949, 12403 and 221636 respectively.

The minimum and maximum number of babies for the city hotel in a month for the years are:

City Hotel:

Max 2015 ==>23, 2016 ==>27, 2017 ==>19 Min 2015 ==> 2, 2016 ==> 7, 2017 ==>10

Resort Hotel: Max 2015 ==>51, 2016 ==>55, 2017 ==>30 Min 2015 ==>11, 2016 ==> 7, 2017 ==>7

The minimum and maximum number of children for the city hotel in a month for the years are:

City Hotel:

Max 2015 ==> 165, 2016 ==> 730, 2017 ==> 604 Min 2015 ==> 14, 2016 ==> 69, 2017 ==> 165

Resort Hotel: Max 2015 ==> 263, 2016 ==> 438, 2017 ==> 580 Min 2015 ==> 47, 2016 ==> 47, 2017 ==> 92

The minimum and maximum number of adults for the city hotel in a month for the years are:

City Hotel:
Max 2015 ==> 6282, 2016 ==> 7755, 2017 ==> 8287 Min 2015 ==> 1918, 2016 ==> 2200, 2017 ==> 4178

Resort Hotel: Max 2015 ==> 3230, 2016 ==> 3664, 2017 ==> 3659 Min 2015 ==> 1796, 2016 ==> 1496, 2017 ==> 2150

Source Code
---
title: "Challenge 2"
author: "Saisrinivas Ambatipudi"
desription: "Data wrangling: using group() and summarise()"
date: "10/16/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_2
  - railroads
  - faostat
  - hotel_bookings
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2)  provide summary statistics for different interesting groups within the data, and interpret those statistics

## Read in the Data

Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.

-   railroad\*.csv or StateCounty2012.xls ⭐
-   FAOstat\*.csv or birds.csv ⭐⭐⭐
-   hotel_bookings.csv ⭐⭐⭐⭐

```{r}
hotel1 <- read_csv("_data/hotel_bookings.csv")
hotel1
```

Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.

## Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

After using the read_csv function we can see that the data has 119320 rows and 32 columns. After using the summary function we can see that:
- The dates span from 2015 to 2017 for this data set.
- In one booking 55 adults booked a room when we check the max of the adults column.
- In one booking 10 children and in another booking 10 babies based on max values of the children and the babies column respectively.
- A booking was on waiting list for a maximum of 391 days!!
- The maximum number of 5 special requests was made on one booking.
- The maximum number of 8 parking spaces were requested on one booking, looks like it was the time 55 people came to the place.XD

```{r}
summary(hotel1)
```

## Provide Grouped Summary Statistics

Conduct some exploratory data analysis, using dplyr commands such as `group_by()`, `select()`, `filter()`, and `summarise()`. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.

```{r}
hotel2 <- hotel1 %>% group_by(hotel, arrival_date_month, arrival_date_year)
hotel3 <- hotel2 %>% summarise(Babies = sum(babies), Children = sum(children, na.rm = TRUE), Adults = sum(adults))
hotel3



x <- filter(hotel3, arrival_date_year == 2015)
y <- filter(hotel3, arrival_date_year == 2016)
z <- filter(hotel3, arrival_date_year == 2017)


x1 <- filter(x, hotel == "City Hotel")
x1
y1 <- filter(y, hotel == "City Hotel")
y1
z1 <- filter(z, hotel == "City Hotel")
z1

x2 <- filter(x, hotel == "Resort Hotel")
x2
y2 <- filter(y, hotel == "Resort Hotel")
y2
z2 <- filter(z, hotel == "Resort Hotel")
z2


summary(x)
summary(y)
summary(z)
summary(x1)
summary(y1)
summary(z1)
summary(x2)
summary(y2)
summary(z2)

sum(hotel3$Babies, na.rm=TRUE)
sum(hotel3$Children, na.rm=TRUE)
sum(hotel3$Adults, na.rm=TRUE)
sum(x1$Babies, na.rm = TRUE)
sum(x1$Children, na.rm = TRUE)
sum(x1$Adults, na.rm = TRUE)
sum(y1$Babies, na.rm = TRUE)
sum(y1$Children, na.rm = TRUE)
sum(y1$Adults, na.rm = TRUE)
sum(z1$Babies, na.rm = TRUE)
sum(z1$Children, na.rm = TRUE)
sum(z1$Adults, na.rm = TRUE)
sum(x2$Babies, na.rm = TRUE)
sum(x2$Children, na.rm = TRUE)
sum(x2$Adults, na.rm = TRUE)
sum(y2$Babies, na.rm = TRUE)
sum(y2$Children, na.rm = TRUE)
sum(y2$Adults, na.rm = TRUE)
sum(z2$Babies, na.rm = TRUE)
sum(z2$Children, na.rm = TRUE)
sum(z2$Adults, na.rm = TRUE)


```

### Explain and Interpret

Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.

So the data analysis that I have done is in regard to how many babies, children and adults stayed in each hotel for the three different years which are city hotel and resort hotel for the years 2015, 2016 and 2017 respectively.

From the analysis we can see that for city hotel the analysis is:
          
Babies: 2015 ==> 84, 2016 ==> 195, 2017 ==> 113    
Children: 2015 ==> 512, 2016 ==> 3814, 2017 ==> 2922    
Adults: 2015 ==> 24385, 2016 ==> 70568, 2017 ==> 51885                                                 


For the resort hotel it is:

Babies: 2015 ==> 139, 2016 ==> 845, 2017 ==> 15877   
Children: 2015 ==> 268, 2016 ==> 2094, 2017 ==> 34396    
Adults: 2015 ==> 150, 2016 ==> 2216, 2017 ==> 24525 


When we take the analysis for both hotels together, the total number of babies, children and adults are 949, 12403 and 221636 respectively.

The minimum and maximum number of babies for the city hotel in a month for the years are:

City Hotel:                    

Max   2015 ==>23, 2016 ==>27, 2017 ==>19
Min   2015 ==> 2, 2016 ==> 7, 2017 ==>10                 

Resort Hotel:
Max   2015 ==>51, 2016 ==>55, 2017 ==>30
Min   2015 ==>11, 2016 ==> 7, 2017 ==>7                

The minimum and maximum number of children for the city hotel in a month for the years are:

City Hotel: 

Max   2015 ==> 165, 2016 ==> 730, 2017 ==> 604
Min   2015 ==> 14, 2016 ==>  69, 2017 ==> 165                  

Resort Hotel:
Max   2015 ==> 263, 2016 ==> 438, 2017 ==> 580
Min   2015 ==>  47, 2016 ==> 47, 2017 ==> 92                

The minimum and maximum number of adults for the city hotel in a month for the years are:

City Hotel:                     
Max   2015 ==> 6282, 2016 ==> 7755, 2017 ==> 8287
Min   2015 ==> 1918, 2016 ==> 2200, 2017 ==> 4178                

Resort Hotel:
Max   2015 ==> 3230, 2016 ==> 3664, 2017 ==> 3659
Min   2015 ==> 1796, 2016 ==> 1496, 2017 ==> 2150