Challenge 10

challenge_10
hotel_bookings
purrr
Author

Sean Conway

Published

February 2, 2023

library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.

For example, we can draw n random samples from 10 different distributions using a vector of 10 means.

n <- 100 # sample size
m <- seq(1,10) # means 
samps <- map(m,rnorm,n=n) 

We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.

samps %>%
  map_dbl(mean)
 [1]  1.050828  1.896501  2.766129  4.009101  5.064788  5.858311  6.860105
 [8]  7.862218  8.991164 10.106274

purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.

The challenge

Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.

I will re-use the dataset I used in Challenge 9 and will compute the Z-Score of the columns, just like I did in Challenge 9.

Function to compute statistics (z-score)

z-score = (x - mean) / std_dev

z_score <- function(col) {
  output <- (col - mean(col)) / sd(col)
  return(output)
}

Reading the dataset

bookings <- read_csv("_data/hotel_bookings.csv")
# taking 20 samples
bookings <- head(bookings, 20)

Calculating the z-score for some of the columns: lead_time, stays_in_week_nights, adr

output <- map(
  list(bookings$lead_time,
       bookings$stays_in_week_nights, 
       bookings$adr), 
  z_score)
head(output)
[[1]]
 [1]  1.52612471  3.82740076 -0.42559042 -0.39063433 -0.38480831 -0.38480831
 [7] -0.46637253 -0.41393839  0.02883878 -0.02942138 -0.33237417 -0.26246199
[13] -0.07020348 -0.36150425 -0.25080996 -0.07020348 -0.25080996 -0.39646034
[19] -0.46637253 -0.42559042

[[2]]
 [1] -1.7018641 -1.7018641 -1.0211185 -1.0211185 -0.3403728 -0.3403728
 [7] -0.3403728 -0.3403728  0.3403728  0.3403728  1.0211185  1.0211185
[13]  1.0211185  1.0211185  1.0211185  1.0211185  1.0211185 -1.0211185
[19] -1.0211185  1.0211185

[[3]]
 [1] -2.400562779 -2.400562779 -0.506478213 -0.506478213  0.074374387
 [6]  0.074374387  0.301664535  0.200646692 -0.329696987  0.263782844
[11]  0.705735909  1.261334049  0.049119926  1.508070131 -0.008712789
[16]  0.049119926  0.061747157 -0.173119329  0.312271409  1.463369736