library(tidyverse)
library(ggplot2)
library(purrr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 10
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 1.081419 2.033515 2.772844 4.197442 5.004348 6.012677 6.920368
[8] 8.062435 8.993681 10.047779
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
Reading in a data set
I read in the hotel bookings csv file.
= read_csv("_data/hotel_bookings.csv")
hotel hotel
Purr
I used the map_dbl function to perform a statistical function on individual columns. I found the median for three columns and took into account the missing values. The printed results are for the adults, children, and babies columns. Additionally, the map2_dbl function was used to take two columns and perform addition on each pairing observation. The result is a vector of the sums of each observation from the two columns labelled adults and children. This can be important because we can understand the total amount of people in each reservation. Lastly, the keep function was used to filter through an individual column for a specific condition; in this case, the condition was that the observations must be greater than 15–meaning we are only recording reservations that have more than 15 adults.
= hotel %>%
result_Median select(adults, children, babies) %>%
map_dbl(median, na.rm = TRUE) #using purrr to recreate finding median from earlier challenges
result_Median
adults children babies
2 0 0
= map2_dbl(hotel$adults, hotel$children, `+`)
result_Sum = head(result_Sum,100)#returns a vector of the sum of the two columns
heading heading
[1] 2 2 1 1 2 2 2 2 2 2 2 2 2 3 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 1 2 2 2 3 3 2
[38] 3 3 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 4 2 2 3 2 2 2 2 2
[75] 3 2 2 1 2 3 2 2 2 2 2 2 2 3 2 2 2 2 2 2 2 2 2 3 3 2
= keep(hotel$adults, function(adults) adults > 15) #keep function filters through the column for specific observations above 15--more than 15 adults in a reservation.
result_filter result_filter
[1] 40 26 50 26 26 27 27 26 26 55 20 20