Challenge 10 - Wild Bird

challenge_10

wild_bird

srujan_kagitala

purrr

Author

Srujan Kagitala

Published

July 11, 2023

Challenge Overview

The purrr package is a powerful tool for functional programming. It allows the user to apply a single function across multiple objects. It can replace for loops with a more readable (and often faster) simple function call.

For example, we can draw n random samples from 10 different distributions using a vector of 10 means.

n <- 100 # sample size
m <- seq(1,10) # means 
samps <- map(m,rnorm,n=n)

We can then use map_dbl to verify that this worked correctly by computing the mean for each sample.

samps %>%
  map_dbl(mean)

 [1] 0.9622548 2.0678934 2.8672047 3.9022962 4.9635645 6.0042105 7.1404022
 [8] 8.0044561 8.9311105 9.8584542

purrr is tricky to learn (but beyond useful once you get a handle on it). Therefore, it’s imperative that you complete the purr and map readings before attempting this challenge.

The challenge

Use purrr with a function to perform some data science task. What this task is is up to you. It could involve computing summary statistics, reading in multiple datasets, running a random process multiple times, or anything else you might need to do in your work as a data analyst. You might consider using purrr with a function you wrote for challenge 9.

Read

Using “wild_bird_data.xlsx” dataset. Building on top of challenge 9.

wild_bird_data <- read_excel("_data/wild_bird_data.xlsx", skip = 1)
wild_bird_data

Function

Creating a function to show statistics like mean, median, min, max, IQR, standard deviation and variance

#Function to give statistics
statistics <- function(data, col_name){
  result <- data %>%
    select({{col_name}}) %>%
    summarise_all(list(mean = mean,
                       median = median,
                       min = min,
                       max = max,
                       IQR = IQR,
                       sd = sd,
                       var = var), na.rm = TRUE)
  list(result)
}

Statistics of wet body weight of wild birds.

statistics(wild_bird_data,`Wet body weight [g]`)

[[1]]
# A tibble: 1 × 7
   mean median   min   max   IQR    sd     var
  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1  364.   69.2  5.46 9640.  291.  984. 967361.

Using map to compute statistics on two data frames of wild_bird_data, split by equal number of rows in each data frame

split_data <- list(wild_bird_data[1:73,], wild_bird_data[74:146,])
split_data

[[1]]
# A tibble: 73 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  5.46           532194.
 2                  7.76          3165107.
 3                  8.64          2592997.
 4                 10.7           3524193.
 5                  7.42           389806.
 6                  9.12           604766.
 7                  8.04           192361.
 8                  8.70           250452.
 9                  8.89            16997.
10                  9.52              595.
# ℹ 63 more rows

[[2]]
# A tibble: 73 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  67.1              59.1
 2                  82.9            2008. 
 3                  64.7            8622. 
 4                  66.5           21762. 
 5                  72.5           36109. 
 6                 128.           279260. 
 7                 135.            90664. 
 8                 116.            32206. 
 9                 111.            30830. 
10                 105.            20287. 
# ℹ 63 more rows

map(split_data, ~statistics(.x, `Wet body weight [g]`))

[[1]]
[[1]][[1]]
# A tibble: 1 × 7
   mean median   min   max   IQR    sd   var
  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  23.4   18.6  5.46  95.7  15.6  16.6  277.


[[2]]
[[2]][[1]]
# A tibble: 1 × 7
   mean median   min   max   IQR    sd      var
  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
1  704.   312.  64.7 9640.  581. 1309. 1713046.

map(split_data, ~statistics(.x, `Population size`))

[[1]]
[[1]][[1]]
# A tibble: 1 × 7
     mean median   min      max     IQR       sd     var
    <dbl>  <dbl> <dbl>    <dbl>   <dbl>    <dbl>   <dbl>
1 668003. 98289.  4.92 5093378. 457443. 1246174. 1.55e12


[[2]]
[[2]][[1]]
# A tibble: 1 × 7
    mean median   min      max    IQR      sd           var
   <dbl>  <dbl> <dbl>    <dbl>  <dbl>   <dbl>         <dbl>
1 97744.  7081.  8.74 2503916. 35564. 327343. 107153758211.