library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 10
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 1.050828 1.896501 2.766129 4.009101 5.064788 5.858311 6.860105
[8] 7.862218 8.991164 10.106274
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
The challenge
Use purrr
with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr
with a function you wrote for challenge 9.
I will re-use the dataset I used in Challenge 9 and will compute the Z-Score of the columns, just like I did in Challenge 9.
Function to compute statistics (z-score)
z-score = (x - mean) / std_dev
<- function(col) {
z_score <- (col - mean(col)) / sd(col)
output return(output)
}
Reading the dataset
<- read_csv("_data/hotel_bookings.csv")
bookings # taking 20 samples
<- head(bookings, 20) bookings
Calculating the z-score for some of the columns: lead_time, stays_in_week_nights, adr
<- map(
output list(bookings$lead_time,
$stays_in_week_nights,
bookings$adr),
bookings
z_score)head(output)
[[1]]
[1] 1.52612471 3.82740076 -0.42559042 -0.39063433 -0.38480831 -0.38480831
[7] -0.46637253 -0.41393839 0.02883878 -0.02942138 -0.33237417 -0.26246199
[13] -0.07020348 -0.36150425 -0.25080996 -0.07020348 -0.25080996 -0.39646034
[19] -0.46637253 -0.42559042
[[2]]
[1] -1.7018641 -1.7018641 -1.0211185 -1.0211185 -0.3403728 -0.3403728
[7] -0.3403728 -0.3403728 0.3403728 0.3403728 1.0211185 1.0211185
[13] 1.0211185 1.0211185 1.0211185 1.0211185 1.0211185 -1.0211185
[19] -1.0211185 1.0211185
[[3]]
[1] -2.400562779 -2.400562779 -0.506478213 -0.506478213 0.074374387
[6] 0.074374387 0.301664535 0.200646692 -0.329696987 0.263782844
[11] 0.705735909 1.261334049 0.049119926 1.508070131 -0.008712789
[16] 0.049119926 0.061747157 -0.173119329 0.312271409 1.463369736