Challenge 10 Submission

challenge_10

purrr

Author

Tanmay Agrawal

Published

February 1, 2023

library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.

For example, we can draw n random samples from 10 different distributions using a vector of 10 means.

n <- 100 # sample size
m <- seq(1,10) # means 
samps <- map(m,rnorm,n=n)

We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.

samps %>%
  map_dbl(mean)

 [1] 1.045109 2.118810 2.943366 3.907274 4.925509 5.753940 6.998604 8.095275
 [9] 8.979529 9.778266

purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.

The challenge

Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.

# Read our dataset, cereal
cereal_data = read_csv("_data/cereal.csv")

We can use the purrr package in various ways, I will use the purrr::do() to calculate the ratio of Sodium and Sugar in cereals marketed to Adults (Type : A) and to Children (Type : B)

# Filter rows where the Sodium column is greater than 200
cereal_high_sodium <- cereal_data %>% filter(Sodium > 200)

cereal_high_sodium

aggregate_data <- cereal_data %>% group_by(Type) %>% summarise(avg_sodium = mean(Sodium), avg_sugar = mean(Sugar))

aggregate_data

calculate_ratio <- function(data) {
  data %>%
  filter(Sugar > 0) %>%
  summarise(ratio = mean(Sodium / Sugar, na.rm = TRUE))
}

# Use purrr::map_dbl (which has been replaced by purrr::do()) to apply the function to each group of the cereal dataset
ratios <- cereal_data %>%
  group_by(Type) %>%
  do(calculate_ratio(.)) %>%
  ungroup() %>%
  pull(ratio)

ratios

[1] 26.44202 41.20354

#cereal_data$ratio <- cereal_data %>% map_dbl(calculate_ratio)

--- title: "Challenge 10 Submission" author: "Tanmay Agrawal" description: "purrr" date: "2/1/2023" format: html: toc: true code-copy: true code-tools: true df-print: paged categories: - challenge_10 --- ```{r} #| label: setup #| warning: false #| message: false library(tidyverse) library(ggplot2) knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) ``` ## Challenge Overview The [purrr](https://purrr.tidyverse.org/) package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call. For example, we can draw `n` random samples from 10 different distributions using a vector of 10 means. ```{r} n <- 100 # sample size m <- seq(1,10) # means samps <- map(m,rnorm,n=n) ``` We can then use `map_dbl` to verify that this worked correctly by computing the mean for each sample. ```{r} samps %>% map_dbl(mean) ``` `purrr` is tricky to learn (but beyond useful once you get a handle on it). Therefore, it's imperative that you complete the `purr` and `map` readings before attempting this challenge. ## The challenge Use `purrr` with a function to perform *some* data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using `purrr` with a function you wrote for challenge 9. ```{r} # Read our dataset, cereal cereal_data = read_csv("_data/cereal.csv") ``` We can use the purrr package in various ways, I will use the `purrr::do()` to calculate the ratio of Sodium and Sugar in cereals marketed to Adults (`Type` : A) and to Children (`Type` : B) ```{r} # Filter rows where the Sodium column is greater than 200 cereal_high_sodium <- cereal_data %>% filter(Sodium > 200) cereal_high_sodium aggregate_data <- cereal_data %>% group_by(Type) %>% summarise(avg_sodium = mean(Sodium), avg_sugar = mean(Sugar)) aggregate_data calculate_ratio <- function(data) { data %>% filter(Sugar > 0) %>% summarise(ratio = mean(Sodium / Sugar, na.rm = TRUE)) } # Use purrr::map_dbl (which has been replaced by purrr::do()) to apply the function to each group of the cereal dataset ratios <- cereal_data %>% group_by(Type) %>% do(calculate_ratio(.)) %>% ungroup() %>% pull(ratio) ratios #cereal_data$ratio <- cereal_data %>% map_dbl(calculate_ratio) ```