challenge_1
wildbirds
Author

Tim Shores

Published

February 25, 2023

Code
my_packages <- c("dplyr", "magrittr", "readxl", "summarytools") # create vector of packages
invisible(lapply(my_packages, require, character.only = TRUE)) # load multiple packages

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

My content

Challenge 1 pits us unfortunate students up against two tasks:

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Code
wbirds <- read_xlsx("../posts/_data/wild_bird_data.xlsx",skip=1) # read in data
showing <- 42

I chose to read in wild_bird_data.xlsx, because wild birds are exciting. The wild bird file includes two rows of headings. I skipped the first row to tidy up my tibble, but it tells us that the data source is “Figure 1 of Nee et al.” That led me to this May 1991 Nature article: The relationship between abundance and body size in British birds

The wild bird data set includes 146 observations under 2 variables. Here are the top 42 observations of British birds – not just any wild birds:

Code
print(wbirds, n=showing)
# A tibble: 146 × 2
   `Wet body weight [g]` `Population size`
                   <dbl>             <dbl>
 1                  5.46           532194.
 2                  7.76          3165107.
 3                  8.64          2592997.
 4                 10.7           3524193.
 5                  7.42           389806.
 6                  9.12           604766.
 7                  8.04           192361.
 8                  8.70           250452.
 9                  8.89            16997.
10                  9.52              595.
11                 10.9               865.
12                 10.1             74386.
13                 10.4            131930.
14                 11.1            164390.
15                 11.8            143944.
16                 13.4            405284.
17                 14.5            472595.
18                 16.7            801279.
19                 18.6           1217094.
20                 18.7           2020905.
21                 18.5           3507479.
22                 22.7           5093378.
23                 27.9           4262042.
24                 33.6           2055446.
25                 27.2           1546053.
26                 28.7            815305.
27                 18.5            642165.
28                 18.7            471555.
29                 19.1            386610.
30                 20.4            369717.
31                 20.6            283748.
32                 22.3            259613.
33                 24.4            323396.
34                 16.2            272074.
35                 14.9            195592.
36                 15.5            143594.
37                 19.5            199489.
38                 17.0             77389.
39                 11.9             62267.
40                 11.8             49948.
41                 12.1             40058.
42                 12.9             15174.
# … with 104 more rows
Code
wb_col1_null <- wbirds %>% select(names(wbirds)[1]) %>% n_distinct(.)
wb_col2_null <- wbirds %>% select(names(wbirds)[2]) %>% n_distinct(.)

The Wet body weight [g] variable has 0 null values.

The Population size variable has 0 null values.

Here’s the summary:

Code
summary(wbirds)
 Wet body weight [g] Population size  
 Min.   :   5.459    Min.   :      5  
 1st Qu.:  18.620    1st Qu.:   1821  
 Median :  69.232    Median :  24353  
 Mean   : 363.694    Mean   : 382874  
 3rd Qu.: 309.826    3rd Qu.: 198515  
 Max.   :9639.845    Max.   :5093378