Harsha Kanaka Eswar Gudipudi
Visualizing Time and Relationships

Harsha Kanaka Eswar Gudipudi


May 16, 2023


knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

df <- read.csv("_data/hotel_bookings.csv")
Briefly describe the data

The dataset contains information about hotels including whether a booking was canceled, lead time, arrival date, stays in weekend nights, stays in week nights, number of adults, children, and babies, market segment, distribution channel, whether the guest was a repeat visitor, previous cancellations and bookings, reserved and assigned room types, booking changes, deposit type, agent and company IDs, days in waiting list, customer type, average daily rate, required parking spaces, total special requests, and reservation status. The data covers the years 2015 to 2017.

# Get unique values for the market_segment column
unique_market_segment <- unique(df$market_segment)
cat(paste(unique_market_segment, collapse = ", "))
Direct, Corporate, Online TA, Offline TA/TO, Complementary, Groups, Undefined, Aviation

Tidy Data (as needed)

I would like to create a plot that displays the trend of reservations whether it is online,direct,etc for each month in a pirticular year. To achieve this, we must first create a new column in the dataset called “arrival-year-month” and aggregate all the bookings into single column.

df_tidy <- df %>%
  mutate(date = paste(arrival_date_year, arrival_date_month, arrival_date_day_of_month, sep = "-"),
         date = as.Date(date, format = "%Y-%B-%d"))
df_tidy$bookings<-rowSums(df_tidy[, c("adults", "children","babies")], na.rm = TRUE)

head(df_tidy[, c("date", "bookings")], n = 5)
        date bookings
1 2015-07-01        2
2 2015-07-01        2
3 2015-07-01        1
4 2015-07-01        1
5 2015-07-01        2

Time Dependent Visualization

bookings_filtered <- df_tidy %>% filter(format(date, "%Y") == 2016)

# Group the data by month and market segment and calculate the total number of bookings in each group
bookings_grouped <- bookings_filtered %>% group_by(format(date, "%m"), market_segment) %>% summarise(total_bookings = sum(bookings))

  filter(market_segment %in% c("Direct", "Online TA"))
# A tibble: 24 × 3
# Groups:   format(date, "%m") [12]
   `format(date, "%m")` market_segment total_bookings
   <chr>                <chr>                   <dbl>
 1 01                   Direct                    655
 2 01                   Online TA                1778
 3 02                   Direct                    977
 4 02                   Online TA                3043
 5 03                   Direct                    933
 6 03                   Online TA                4761
 7 04                   Direct                    903
 8 04                   Online TA                5125
 9 05                   Direct                    864
10 05                   Online TA                4771
# ℹ 14 more rows
ggplot(bookings_grouped, aes(x = `format(date, "%m")`, y = total_bookings, fill = market_segment)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  labs(x = "Month of Booking", y = "Number of Bookings", title = paste("Total Bookings by Market Segment in", year)) + 

Visualizing Part-Whole Relationships

ggplot(b1, aes(x = `format(date, "%m")`, y = total_bookings, fill = market_segment)) +
  geom_col() +
  labs(title = "Part-Whole Relationships for Bookings by Market Segment and Month",
       y = "Total Bookings",
       x = "Month") +
  theme(legend.position = "bottom")