library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 10 Submission
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 1.045109 2.118810 2.943366 3.907274 4.925509 5.753940 6.998604 8.095275
[9] 8.979529 9.778266
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
The challenge
Use purrr
with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr
with a function you wrote for challenge 9.
# Read our dataset, cereal
= read_csv("_data/cereal.csv") cereal_data
We can use the purrr package in various ways, I will use the purrr::do()
to calculate the ratio of Sodium and Sugar in cereals marketed to Adults (Type
: A) and to Children (Type
: B)
# Filter rows where the Sodium column is greater than 200
<- cereal_data %>% filter(Sodium > 200)
cereal_high_sodium
cereal_high_sodium
<- cereal_data %>% group_by(Type) %>% summarise(avg_sodium = mean(Sodium), avg_sugar = mean(Sugar))
aggregate_data
aggregate_data
<- function(data) {
calculate_ratio %>%
data filter(Sugar > 0) %>%
summarise(ratio = mean(Sodium / Sugar, na.rm = TRUE))
}
# Use purrr::map_dbl (which has been replaced by purrr::do()) to apply the function to each group of the cereal dataset
<- cereal_data %>%
ratios group_by(Type) %>%
do(calculate_ratio(.)) %>%
ungroup() %>%
pull(ratio)
ratios
[1] 26.44202 41.20354
#cereal_data$ratio <- cereal_data %>% map_dbl(calculate_ratio)