Challenge 1 Paritosh

challenge_1

railroads

faostat

wildbirds

Challenge_1_Final

Author

Paritosh Gandhi

Published

March 28, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

railroad_2012_clean_county.csv ⭐
birds.csv ⭐⭐
FAOstat*.csv ⭐⭐
wild_bird_data.xlsx ⭐⭐⭐
StateCounty2012.xls ⭐⭐⭐⭐

Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.

Code

library(readxl)
library(tidyverse)
library(summarytools)


df = read_excel("_data/wild_bird_data.xlsx", skip=1)

## instead os using skip we can also use the below line of code

#df <- df[2:147,]

Describe the data

The wild bird data conatins 2 columns and 147 entries. The first columns consists of wet body weights of the birds in grams and and second columns consists of population size.The dataset is different as the reference column which provides the information about wet body weight in grams is stored in the form of character which is converted into numeric form using the “as.numeric” function The minimum wet body weight in grams is 5.45 gms and the mean being 363.74 gms while the max is 9639.84

Using as.numeric(), Min, Max,Mean

Code

min(as.numeric(df$`Wet body weight [g]`), na.rm = T)

[1] 5.458872

Code

mean(as.numeric(df$`Wet body weight [g]`), na.rm = T)

[1] 363.6943

Code

max(as.numeric(df$`Wet body weight [g]`), na.rm = T)

[1] 9639.845

Using Select()

Code

df %>% 
  select(`Wet body weight [g]`) %>% 
  n_distinct()

[1] 146

Using Filter()

Code

df %>% filter(`Wet body weight [g]` > 3000)

# A tibble: 3 × 2
  `Wet body weight [g]` `Population size`
                  <dbl>             <dbl>
1                 9640.             3417.
2                 4451.             4789.
3                 4224.              433.

summary(df)

Code

dfSummary(df)

Data Frame Summary  
df  
Dimensions: 146 x 2  
Duplicates: 0  

-------------------------------------------------------------------------------------------------------------
No   Variable              Stats / Values                  Freqs (% of Valid)    Graph   Valid      Missing  
---- --------------------- ------------------------------- --------------------- ------- ---------- ---------
1    Wet body weight [g]   Mean (sd) : 363.7 (983.5)       146 distinct values   :       146        0        
     [numeric]             min < med < max:                                      :       (100.0%)   (0.0%)   
                           5.5 < 69.2 < 9639.8                                   :                           
                           IQR (CV) : 291.2 (2.7)                                :                           
                                                                                 : .                         

2    Population size       Mean (sd) : 382874 (951938.7)   146 distinct values   :       146        0        
     [numeric]             min < med < max:                                      :       (100.0%)   (0.0%)   
                           4.9 < 24353.2 < 5093378                               :                           
                           IQR (CV) : 196693.8 (2.5)                             :                           
                                                                                 : .                         
-------------------------------------------------------------------------------------------------------------

Challenge Overview

Read in the Data

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Describe the data

Using as.numeric(), Min, Max,Mean

Using Select()

Using Filter()

Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.