Challenge 10

challenge_10
purrr
Author

Cristhian Barba Garzon

Published

January 25, 2023

library(tidyverse)
library(ggplot2)
library(purrr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.

For example, we can draw n random samples from 10 different distributions using a vector of 10 means.

n <- 100 # sample size
m <- seq(1,10) # means 
samps <- map(m,rnorm,n=n) 

We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.

samps %>%
  map_dbl(mean)
 [1]  1.081419  2.033515  2.772844  4.197442  5.004348  6.012677  6.920368
 [8]  8.062435  8.993681 10.047779

purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.

Reading in a data set

I read in the hotel bookings csv file.

hotel = read_csv("_data/hotel_bookings.csv")
hotel

Purr

I used the map_dbl function to perform a statistical function on individual columns. I found the median for three columns and took into account the missing values. The printed results are for the adults, children, and babies columns. Additionally, the map2_dbl function was used to take two columns and perform addition on each pairing observation. The result is a vector of the sums of each observation from the two columns labelled adults and children. This can be important because we can understand the total amount of people in each reservation. Lastly, the keep function was used to filter through an individual column for a specific condition; in this case, the condition was that the observations must be greater than 15–meaning we are only recording reservations that have more than 15 adults.

result_Median = hotel %>%
  select(adults, children, babies) %>%
  map_dbl(median, na.rm = TRUE) #using purrr to recreate finding median from earlier challenges 
result_Median
  adults children   babies 
       2        0        0 
result_Sum =  map2_dbl(hotel$adults, hotel$children, `+`)
heading = head(result_Sum,100)#returns a vector of the sum of the two columns 
heading
  [1] 2 2 1 1 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 3 3 2
 [38] 3 3 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 4 2 2 3 2 2 2 2 2
 [75] 3 2 2 1 2 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 3 3 2
result_filter = keep(hotel$adults, function(adults) adults > 15) #keep function filters through the column for specific observations above 15--more than 15 adults in a reservation. 
result_filter
 [1] 40 26 50 26 26 27 27 26 26 55 20 20