library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 10
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 0.8445498 2.0705627 2.7996149 4.0622924 4.9630348 6.0612032
[7] 7.0852765 7.9385544 9.0805295 10.0614436
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
The challenge
Use purrr
with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr
with a function you wrote for challenge 9.
library(purrr)
<- function(x) {
get_summary_statistics <- mean(x)
mean <- median(x)
median <- sd(x)
std_dev <- min(x)
min_val <- max(x)
max_val <- c(mean, median, std_dev, min_val, max_val)
result names(result) <- c("Mean", "Median", "Standard Deviation", "Minimum", "Maximum")
return(result)
}<- c(90, 85, 77, 69, 41, 71, 80, 65, 83, 29, 53, 95, 94, 90, 87, 71, 70, 65, 58, 59, 98, 90, 87, 76)
test_scores get_summary_statistics(test_scores)
Mean Median Standard Deviation Minimum
74.29167 76.50000 17.48162 29.00000
Maximum
98.00000
<- function(x) {
remove_outliers <- keep(x, x >= 50)
result return(result)
}<- remove_outliers(test_scores)
updated_scores
get_summary_statistics(updated_scores)
Mean Median Standard Deviation Minimum
77.86364 78.50000 13.07231 53.00000
Maximum
98.00000
Above, we use purr and one of its key functions, “keep” to mainpulate a similar data object as we did in challenge 9 where we had to create a function. We are using that same function as we did in 9 to calculate the summary statistics of a group of test scores, but this time we are manipulating the test scores vector before we run the summary statistics function on it. We want to disregard all outliers in our data set, so we can assess the test on its meri without taking into consideration the really low scores some students who didn’t study got. We want to see if our test was fair to those that WERE adequately prepared, so we us the purrr keep function that’s used to manipulate lists to remove all test score values that were substantially worse than the mean- any test score below 50.
After, we run our summary statistics function on our updated data, and we see a more fair test- the mean and median scores went up and the standard dev went down!