library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 9 Solution
Challenge Overview
Today’s challenge is simple. Create a function, and use it to perform a data analysis / cleaning / visualization task:
Examples of such functions are: 1) A function that reads in and cleans a dataset.
2) A function that computes summary statistics (e.g., computes the z score for a variable).
3) A function that plots a histogram.
That’s it!
I will be using the Hotel Bookings dataset for my Challenge I have imported it using the read_csv() function and will use the glimpse() function the see the columns it has. On a high level it seems to have the information of hotel type and its customer data like arrival departure information, number of people, their booking details, payment type and reservation details. The data has 119,390 rows and 32 columns and seems to be captured from different hotels between 2015 to 2015 when people checked in and out.
A lot of columns have values that are encoded and not easily understandable. Thus, I will create a function that makes my data more readable by replacing value in columns by more readable value.
<- read_csv("_data/hotel_bookings.csv")
hotel_bookings head(hotel_bookings)
# A tibble: 6 × 32
hotel is_canceled lead_time arrival_date_year arrival_date_month
<chr> <dbl> <dbl> <dbl> <chr>
1 Resort Hotel 0 342 2015 July
2 Resort Hotel 0 737 2015 July
3 Resort Hotel 0 7 2015 July
4 Resort Hotel 0 13 2015 July
5 Resort Hotel 0 14 2015 July
6 Resort Hotel 0 14 2015 July
# ℹ 27 more variables: arrival_date_week_number <dbl>,
# arrival_date_day_of_month <dbl>, stays_in_weekend_nights <dbl>,
# stays_in_week_nights <dbl>, adults <dbl>, children <dbl>, babies <dbl>,
# meal <chr>, country <chr>, market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, …
unique(hotel_bookings$meal)
[1] "BB" "FB" "HB" "SC" "Undefined"
<- function(data, column_name, old_value, new_value) {
replace_values == old_value] <- new_value
data[[column_name]][data[[column_name]] return(data)
}
<- replace_values(hotel_bookings,"meal","BB", "Bed and Breakfast")
hotel_bookings <- replace_values(hotel_bookings,"meal","HB", "Half Board")
hotel_bookings <- replace_values(hotel_bookings,"meal","FB", "Full Board")
hotel_bookings <- replace_values(hotel_bookings,"meal","SC", "Self Catering")
hotel_bookings
unique(hotel_bookings$meal)
[1] "Bed and Breakfast" "Full Board" "Half Board"
[4] "Self Catering" "Undefined"
unique(hotel_bookings$market_segment)
[1] "Direct" "Corporate" "Online TA" "Offline TA/TO"
[5] "Complementary" "Groups" "Undefined" "Aviation"
<- replace_values(hotel_bookings,"market_segment","Online TA", "Online Travel Agent")
hotel_bookings <- replace_values(hotel_bookings,"market_segment","Offline TA/TO", "Offline Travel Agent/Tour Operator")
hotel_bookings
unique(hotel_bookings$market_segment)
[1] "Direct" "Corporate"
[3] "Online Travel Agent" "Offline Travel Agent/Tour Operator"
[5] "Complementary" "Groups"
[7] "Undefined" "Aviation"
Thus I have replaced meal codes with their full forms and market segment short forms to make data more readable. The same can be done for other columns too using replace_values function.