n <- 100 # sample size
m <- seq(1,10) # means
samps <- map(m,rnorm,n=n) Challenge 10 Solution
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n random samples from 10 different distributions using a vector of 10 means.
We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.
samps %>%
map_dbl(mean) [1] 1.052554 1.905349 3.034037 4.038838 5.151351 6.094199 7.032789 7.970365
[9] 8.977754 9.949969
purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.
The challenge
Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.
I am using map_chr() from purrr along with my function that I created in challenge 9 as suggested. I am replacing multiple LOV’s in columns market_segment and meal with values that make the dataset more readable.
hotel_bookings <- read_csv("_data/hotel_bookings.csv")
head(hotel_bookings)# A tibble: 6 × 32
hotel is_canceled lead_time arrival_date_year arrival_date_month
<chr> <dbl> <dbl> <dbl> <chr>
1 Resort Hotel 0 342 2015 July
2 Resort Hotel 0 737 2015 July
3 Resort Hotel 0 7 2015 July
4 Resort Hotel 0 13 2015 July
5 Resort Hotel 0 14 2015 July
6 Resort Hotel 0 14 2015 July
# ℹ 27 more variables: arrival_date_week_number <dbl>,
# arrival_date_day_of_month <dbl>, stays_in_weekend_nights <dbl>,
# stays_in_week_nights <dbl>, adults <dbl>, children <dbl>, babies <dbl>,
# meal <chr>, country <chr>, market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, …
unique(hotel_bookings$meal)[1] "BB" "FB" "HB" "SC" "Undefined"
unique(hotel_bookings$market_segment)[1] "Direct" "Corporate" "Online TA" "Offline TA/TO"
[5] "Complementary" "Groups" "Undefined" "Aviation"
# Define the replacement mappings
meal_replacements <- c("BB" = "Bed and Breakfast",
"HB" = "Half Board",
"FB" = "Full Board",
"SC" = "Self Catering")
market_segment_replacements <- c("Online TA" = "Online Travel Agent",
"Offline TA/TO" = "Offline Travel Agent/Tour Operator")
# Function to replace values
replace_values <- function(data, column_name, replacements) {
data %>%
mutate({{column_name}} := map_chr({{column_name}}, ~ ifelse(.x %in% names(replacements), replacements[.x], .x)))
}
# Replace meal values
hotel_bookings <- replace_values(hotel_bookings, meal, meal_replacements)
unique(hotel_bookings$meal)[1] "Bed and Breakfast" "Full Board" "Half Board"
[4] "Self Catering" "Undefined"
# Replace market_segment values
hotel_bookings <- replace_values(hotel_bookings, market_segment, market_segment_replacements)
unique(hotel_bookings$market_segment)[1] "Direct" "Corporate"
[3] "Online Travel Agent" "Offline Travel Agent/Tour Operator"
[5] "Complementary" "Groups"
[7] "Undefined" "Aviation"