n <- 100 # sample size
m <- seq(1,10) # means
samps <- map(m,rnorm,n=n) Challenge 10 Instructions
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n random samples from 10 different distributions using a vector of 10 means.
We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.
samps %>%
map_dbl(mean) [1] 1.114790 2.016048 3.056087 3.906096 5.023842 5.968324 6.968416 8.094015
[9] 8.871233 9.912330
purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.
The challenge
Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.
we choose - egg_tidy.csv ⭐⭐
calculate_mean <- function(column) {
return(mean(column, na.rm = TRUE))
}
calculate_means <- function(data) {
numeric_cols <- select_if(data, is.numeric)
means <- map_dbl(numeric_cols, calculate_mean)
return(means)
}
data <- read.csv("_data/eggs_tidy.csv")
means <- calculate_means(data)
print(means) year large_half_dozen large_dozen
2008.5000 155.1656 254.1979
extra_large_half_dozen extra_large_dozen
164.2235 266.7958
Analysis
- The calculate_mean function: This function simply takes a column of numbers (like a list of your grades) and finds the average (mean) value. It ignores any missing values, because we tell it to do so with na.rm = TRUE. Imagine you have 5 test scores: 80, 90, 85, NA, 95. The NA means you didn’t take one test. The average would then be (80+90+85+95)/4 = 87.5.
- The calculate_means function: This function uses our earlier calculate_mean function, but it applies it to many columns at once. First, it identifies which columns are numeric (like finding out which of your subjects are graded). Then, it calculates the average grade for each subject.
Now, let’s understand our data:
Year: The mean here (2008.5) isn’t that meaningful because ‘year’ is not a typical numerical variable. It’s more like a category. But technically, this means the middle year in our data is 2008.
Large_half_dozen,Large_dozen,Extra_large_half_dozen,Extra_large_dozen: These are the average prices of different egg packages over all the years and months in the data. For instance, the average price of a half-dozen large eggs was around 155.16 units (maybe cents or dollars, depending on the data).
That’s it! We’ve used our functions to get a quick sense of the average prices in our egg data.