Challenge 1 Instructions

challenge_1
laurenzichittella
wild_bird_data
Author

Lauren Zichittella

Published

February 18, 2023

Read in the Data, wild_bird_data.xlsx

Summary/Results

This dataset was utilized by Nee et al to produce a figure to show the relationship between wild bird weight and population size.

It consists of two numeric variables, wet body weight in grams and population size and 146 distinct observations.

Technical notes

  • Utilize read_xls
  • Skip import of first two rows
    • row 1 includes header
    • row 2 provides variable names I do not want to use
  • Assign new variable names:
    • Wet body weight [g] = wet_weight_g
    • Population size = pop_size
  • Output variable class
  • Sort by ascending values of wet_weight_g
Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

library(readxl)

wild_bird_data2 <- read_xlsx( "_data/wild_bird_data.xlsx", skip = 2, col_names = c("wet_weight_g", "pop_size"))

sapply(wild_bird_data2, class)
wet_weight_g     pop_size 
   "numeric"    "numeric" 
Code
arrange(wild_bird_data2,wet_weight_g)
# A tibble: 146 × 2
   wet_weight_g pop_size
          <dbl>    <dbl>
 1         5.46  532194.
 2         7.42  389806.
 3         7.76 3165107.
 4         8.04  192361.
 5         8.64 2592997.
 6         8.70  250452.
 7         8.89   16997.
 8         9.12  604766.
 9         9.52     595.
10        10.1    74386.
# … with 136 more rows

Provide baseline characterization of data

Summary/Results

Both Variables are populated for all observations and represent a wide range of values. Distributions of weight and population size are positively skewed bu outliers.

Technical notes

Provide the following information:
- Print of first 23 observations
- Print of last 23 observations
- Filter on presence of missing values for wet_weight_g or pop_size
- Summarize numeric variables

Code
slice(wild_bird_data2, 1:23)
# A tibble: 23 × 2
   wet_weight_g pop_size
          <dbl>    <dbl>
 1         5.46  532194.
 2         7.76 3165107.
 3         8.64 2592997.
 4        10.7  3524193.
 5         7.42  389806.
 6         9.12  604766.
 7         8.04  192361.
 8         8.70  250452.
 9         8.89   16997.
10         9.52     595.
# … with 13 more rows
Code
slice(wild_bird_data2, 124:146)
# A tibble: 23 × 2
   wet_weight_g pop_size
          <dbl>    <dbl>
 1         821.     253.
 2         645.     386.
 3         757.     612.
 4         604.    1162.
 5        1008.     908.
 6         528.   40510.
 7         766.   57460.
 8         923.   34547.
 9         887.   13391.
10         955.   12526.
# … with 13 more rows
Code
filter(wild_bird_data2, is.na(wet_weight_g) | is.na(pop_size))
# A tibble: 0 × 2
# … with 2 variables: wet_weight_g <dbl>, pop_size <dbl>
Code
result='asis'
library(summarytools)

dfSummary(wild_bird_data2, varnumbers = FALSE, valid.col = FALSE)
Data Frame Summary  
wild_bird_data2  
Dimensions: 146 x 2  
Duplicates: 0  

--------------------------------------------------------------------------------------
Variable       Stats / Values                  Freqs (% of Valid)    Graph   Missing  
-------------- ------------------------------- --------------------- ------- ---------
wet_weight_g   Mean (sd) : 363.7 (983.5)       146 distinct values   :       0        
[numeric]      min < med < max:                                      :       (0.0%)   
               5.5 < 69.2 < 9639.8                                   :                
               IQR (CV) : 291.2 (2.7)                                :                
                                                                     : .              

pop_size       Mean (sd) : 382874 (951938.7)   146 distinct values   :       0        
[numeric]      min < med < max:                                      :       (0.0%)   
               4.9 < 24353.2 < 5093378                               :                
               IQR (CV) : 196693.8 (2.5)                             :                
                                                                     : .              
--------------------------------------------------------------------------------------
Code
hist(  wild_bird_data2$wet_weight_g
     , main="Distribution of Weight Captured in Wild Bird Datset"
     , xlab="Wet Weight in Grams"
     , col = "red")

Code
hist(  wild_bird_data2$pop_size
     , main="Distribution of Population Size Captured in Wild Bird Datset"
     , xlab = "Population Size"
     , col = "red")

Evaluate relationship between weight and population size graphically

Summary/Results

Plotting weight by population size does not clearly point to any relationship between these variables.

Technical notes

Plot results on log10 scale

Code
# Load packages
require(MASS) # to access Animals data sets
require(scales) # to access break formatting functions
library(ggplot2)
start_plot <- ggplot(wild_bird_data2, aes(x=pop_size, y=wet_weight_g)) + geom_point()

start_plot2 <- start_plot +
     scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
              labels = trans_format("log10", math_format(10^.x))) +
     theme_bw() 

start_plot2 +annotation_logticks() +
  ggtitle("Relationship between Weight and Population Size in Wild Birds") +
  xlab("Population Size")+
  ylab("Wet Weight (g)")