<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
Challenge 10 Instructions
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 1.114790 2.016048 3.056087 3.906096 5.023842 5.968324 6.968416 8.094015
[9] 8.871233 9.912330
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
The challenge
Use purrr
with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr
with a function you wrote for challenge 9.
we choose - egg_tidy.csv ⭐⭐
<- function(column) {
calculate_mean return(mean(column, na.rm = TRUE))
}<- function(data) {
calculate_means <- select_if(data, is.numeric)
numeric_cols <- map_dbl(numeric_cols, calculate_mean)
means return(means)
}<- read.csv("_data/eggs_tidy.csv")
data <- calculate_means(data)
means print(means)
year large_half_dozen large_dozen
2008.5000 155.1656 254.1979
extra_large_half_dozen extra_large_dozen
164.2235 266.7958
Analysis
- The calculate_mean function: This function simply takes a column of numbers (like a list of your grades) and finds the average (mean) value. It ignores any missing values, because we tell it to do so with na.rm = TRUE. Imagine you have 5 test scores: 80, 90, 85, NA, 95. The NA means you didn’t take one test. The average would then be (80+90+85+95)/4 = 87.5.
- The calculate_means function: This function uses our earlier calculate_mean function, but it applies it to many columns at once. First, it identifies which columns are numeric (like finding out which of your subjects are graded). Then, it calculates the average grade for each subject.
Now, let’s understand our data:
Year
: The mean here (2008.5) isn’t that meaningful because ‘year’ is not a typical numerical variable. It’s more like a category. But technically, this means the middle year in our data is 2008.
Large_half_dozen
,Large_dozen
,Extra_large_half_dozen
,Extra_large_dozen
: These are the average prices of different egg packages over all the years and months in the data. For instance, the average price of a half-dozen large eggs was around 155.16 units (maybe cents or dollars, depending on the data).
That’s it! We’ve used our functions to get a quick sense of the average prices in our egg data.