<- 100 # sample size
n <- seq(1,10) # means
m <- map(m,rnorm,n=n) samps
Challenge 10 - Wild Bird
Challenge Overview
The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.
For example, we can draw n
random samples from 10 different distributions using a vector of 10 means.
We can then use map_dbl
to verify that this worked correctly by computing the mean for each sample.
%>%
samps map_dbl(mean)
[1] 0.9622548 2.0678934 2.8672047 3.9022962 4.9635645 6.0042105 7.1404022
[8] 8.0044561 8.9311105 9.8584542
purrr
is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr
and map
readings before attempting this challenge.
The challenge
Use purrr
with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr
with a function you wrote for challenge 9.
Read
Using “wild_bird_data.xlsx” dataset. Building on top of challenge 9.
<- read_excel("_data/wild_bird_data.xlsx", skip = 1)
wild_bird_data wild_bird_data
Function
Creating a function to show statistics like mean, median, min, max, IQR, standard deviation and variance
#Function to give statistics
<- function(data, col_name){
statistics <- data %>%
result select({{col_name}}) %>%
summarise_all(list(mean = mean,
median = median,
min = min,
max = max,
IQR = IQR,
sd = sd,
var = var), na.rm = TRUE)
list(result)
}
Statistics of wet body weight of wild birds.
statistics(wild_bird_data,`Wet body weight [g]`)
[[1]]
# A tibble: 1 × 7
mean median min max IQR sd var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 364. 69.2 5.46 9640. 291. 984. 967361.
Using map
to compute statistics on two data frames of wild_bird_data, split by equal number of rows in each data frame
<- list(wild_bird_data[1:73,], wild_bird_data[74:146,])
split_data split_data
[[1]]
# A tibble: 73 × 2
`Wet body weight [g]` `Population size`
<dbl> <dbl>
1 5.46 532194.
2 7.76 3165107.
3 8.64 2592997.
4 10.7 3524193.
5 7.42 389806.
6 9.12 604766.
7 8.04 192361.
8 8.70 250452.
9 8.89 16997.
10 9.52 595.
# ℹ 63 more rows
[[2]]
# A tibble: 73 × 2
`Wet body weight [g]` `Population size`
<dbl> <dbl>
1 67.1 59.1
2 82.9 2008.
3 64.7 8622.
4 66.5 21762.
5 72.5 36109.
6 128. 279260.
7 135. 90664.
8 116. 32206.
9 111. 30830.
10 105. 20287.
# ℹ 63 more rows
map(split_data, ~statistics(.x, `Wet body weight [g]`))
[[1]]
[[1]][[1]]
# A tibble: 1 × 7
mean median min max IQR sd var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 23.4 18.6 5.46 95.7 15.6 16.6 277.
[[2]]
[[2]][[1]]
# A tibble: 1 × 7
mean median min max IQR sd var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 704. 312. 64.7 9640. 581. 1309. 1713046.
map(split_data, ~statistics(.x, `Population size`))
[[1]]
[[1]][[1]]
# A tibble: 1 × 7
mean median min max IQR sd var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 668003. 98289. 4.92 5093378. 457443. 1246174. 1.55e12
[[2]]
[[2]][[1]]
# A tibble: 1 × 7
mean median min max IQR sd var
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 97744. 7081. 8.74 2503916. 35564. 327343. 107153758211.