Challenge 1 Paritosh

challenge_1
railroads
faostat
wildbirds
Challenge_1_Final
Author

Paritosh Gandhi

Published

March 28, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.


Code
library(readxl)
library(tidyverse)
library(summarytools)


df = read_excel("_data/wild_bird_data.xlsx", skip=1)

## instead os using skip we can also use the below line of code

#df <- df[2:147,]

Describe the data

The wild bird data conatins 2 columns and 147 entries. The first columns consists of wet body weights of the birds in grams and and second columns consists of population size.The dataset is different as the reference column which provides the information about wet body weight in grams is stored in the form of character which is converted into numeric form using the “as.numeric” function The minimum wet body weight in grams is 5.45 gms and the mean being 363.74 gms while the max is 9639.84

Using as.numeric(), Min, Max,Mean

Code
min(as.numeric(df$`Wet body weight [g]`), na.rm = T)
[1] 5.458872
Code
mean(as.numeric(df$`Wet body weight [g]`), na.rm = T)
[1] 363.6943
Code
max(as.numeric(df$`Wet body weight [g]`), na.rm = T)
[1] 9639.845

Using Select()

Code
df %>% 
  select(`Wet body weight [g]`) %>% 
  n_distinct()
[1] 146

Using Filter()

Code
df %>% filter(`Wet body weight [g]` > 3000)
# A tibble: 3 × 2
  `Wet body weight [g]` `Population size`
                  <dbl>             <dbl>
1                 9640.             3417.
2                 4451.             4789.
3                 4224.              433.
  • summary(df)
Code
dfSummary(df)
Data Frame Summary  
df  
Dimensions: 146 x 2  
Duplicates: 0  

-------------------------------------------------------------------------------------------------------------
No   Variable              Stats / Values                  Freqs (% of Valid)    Graph   Valid      Missing  
---- --------------------- ------------------------------- --------------------- ------- ---------- ---------
1    Wet body weight [g]   Mean (sd) : 363.7 (983.5)       146 distinct values   :       146        0        
     [numeric]             min < med < max:                                      :       (100.0%)   (0.0%)   
                           5.5 < 69.2 < 9639.8                                   :                           
                           IQR (CV) : 291.2 (2.7)                                :                           
                                                                                 : .                         

2    Population size       Mean (sd) : 382874 (951938.7)   146 distinct values   :       146        0        
     [numeric]             min < med < max:                                      :       (100.0%)   (0.0%)   
                           4.9 < 24353.2 < 5093378                               :                           
                           IQR (CV) : 196693.8 (2.5)                             :                           
                                                                                 : .                         
-------------------------------------------------------------------------------------------------------------