challenge_1
Author

Mani Shanker Kamarapu

Published

August 15, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xlsx ⭐⭐⭐⭐

I will be working on the “wild_bird_data” dataset.

Code
# Loading `readxl` package
library(readxl)
wild_bird <- read_xlsx("_data/wild_bird_data.xlsx")

# View the dataset
wild_bird
# A tibble: 147 × 2
   Reference           `Taken from Figure 1 of Nee et al.`
   <chr>               <chr>                              
 1 Wet body weight [g] Population size                    
 2 5.45887180052624    532194.395145161                   
 3 7.76456810683605    3165107.44544653                   
 4 8.63858738018464    2592996.86778979                   
 5 10.6897349302105    3524193.2266336                    
 6 7.41722577905587    389806.168891807                   
 7 9.1169347252776     604765.97978904                    
 8 8.03684333000353    192360.511579436                   
 9 8.70473119796067    250452.449623033                   
10 8.89032317828959    16997.4156415239                   
# … with 137 more rows
# ℹ Use `print(n = ...)` to see more rows

Describe the data

Code
# Use dim() to get dimensions of dataset
dim(wild_bird)
[1] 147   2

There are 147 cases in 2 columns(Reference and Taken from Figure 1 of Nee et al). Actually the second row has the real column names so we will now make second row as column names and remove the first row.

Code
#Rename the column names
colnames(wild_bird) <- wild_bird[1,]
#Removing the first row
wild_bird <- wild_bird[-1,]
#New dimensions of dataset
dim(wild_bird)
[1] 146   2
Code
#View the dataset
wild_bird
# A tibble: 146 × 2
   `Wet body weight [g]` `Population size`
   <chr>                 <chr>            
 1 5.45887180052624      532194.395145161 
 2 7.76456810683605      3165107.44544653 
 3 8.63858738018464      2592996.86778979 
 4 10.6897349302105      3524193.2266336  
 5 7.41722577905587      389806.168891807 
 6 9.1169347252776       604765.97978904  
 7 8.03684333000353      192360.511579436 
 8 8.70473119796067      250452.449623033 
 9 8.89032317828959      16997.4156415239 
10 9.51590845877281      595.09393677964  
# … with 136 more rows
# ℹ Use `print(n = ...)` to see more rows
Code
#Summary of dataset
summary(wild_bird)
 Wet body weight [g] Population size   
 Length:146          Length:146        
 Class :character    Class :character  
 Mode  :character    Mode  :character  

The dataset is in character class so first we need to convert character class to numeric and then get the summary.

Code
#Converting datset to numeric
wild_bird$`Wet body weight [g]` <- as.numeric(wild_bird$`Wet body weight [g]`)
wild_bird$`Population size` <- as.numeric(wild_bird$`Population size`)
#Summary of the converted dataset
summary(wild_bird)
 Wet body weight [g] Population size  
 Min.   :   5.459    Min.   :      5  
 1st Qu.:  18.620    1st Qu.:   1821  
 Median :  69.232    Median :  24353  
 Mean   : 363.694    Mean   : 382874  
 3rd Qu.: 309.826    3rd Qu.: 198515  
 Max.   :9639.845    Max.   :5093378  

This is the brief summary of the wild_bird dataset.