challenge_1
wild_bird_data
Author

Miranda Manka

Published

August 15, 2022

Code
library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Code
#Read in the excel file
wild_bird_data = read_excel("_data/wild_bird_data.xlsx")

Describe the data

Using a combination of words and results of R commands, provide a high level description of the data.

Code
#View the data
view(wild_bird_data)

#Find the dimensions of the data
dim(wild_bird_data)
[1] 147   2
Code
#Look at the first few observations
head(wild_bird_data)
# A tibble: 6 × 2
  Reference           `Taken from Figure 1 of Nee et al.`
  <chr>               <chr>                              
1 Wet body weight [g] Population size                    
2 5.45887180052624    532194.395145161                   
3 7.76456810683605    3165107.44544653                   
4 8.63858738018464    2592996.86778979                   
5 10.6897349302105    3524193.2266336                    
6 7.41722577905587    389806.168891807                   
Code
#Get column names
colnames(wild_bird_data)
[1] "Reference"                         "Taken from Figure 1 of Nee et al."

The data appear to be about 146 wild birds, detailing two pieces of information for each - their wet body weight (in grams) and the size of the population they are in. It is hard to tell more about this data without other information, such as the types of birds or how/when/where the data were collected, although if I had to guess, I would say that it was probably collected in forests or other outdoor areas with many birds because the title of the dataset indicates they are wild birds.

Interestingly, the actual variable names seem to be in the next row instead of the labels the data currently has (the variables should be “Wet body weight [g]” for the first variable/column instead of “Reference”, and “Population size” for the second instead of “Taken from Figure 1 of Nee et al.”)

Code
#Work to correct variable/column names
#Remove first row
wild_bird_data <- wild_bird_data[-c(1), ]
#Rename columns/variables
wild_bird_data = rename(wild_bird_data, wet_body_weight_g = Reference)
wild_bird_data = rename(wild_bird_data, population_size = `Taken from Figure 1 of Nee et al.`)
#Create value to use from each column
wet_body_weight_g = wild_bird_data[, 1]
population_size = wild_bird_data[, 2]

#Change type
wet_body_weight_g = as.numeric(unlist(wet_body_weight_g))
population_size = as.numeric(unlist(population_size))

#Summary of variables
summary(wet_body_weight_g)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   5.459   18.620   69.232  363.694  309.826 9639.845 
Code
summary(population_size)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      5    1821   24353  382874  198515 5093378 

The data are very spread out, with a lot of variation in values. This indicates that there is likely a large variety in the types of birds and/or the geographical location of the birds in these measurements.