Challenge 1

challenge_1

railroads

faostat

wildbirds

Getting acquainted with the properties of the dataset

Author

Priyanka Perumalla

Published

May 15, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning =FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

railroad_2012_clean_county.csv ⭐
birds.csv ⭐⭐
FAOstat*.csv ⭐⭐
wild_bird_data.xlsx ⭐⭐⭐
StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

The birds.csv file is read in using read_csv().

Code

df <- read_csv('/Users/priyankaperumalla/Desktop/daccs/601_Spring_2023/posts/_data/FAOSTAT_egg_chicken.csv', show_col_types = FALSE)
head(df)

# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QL            Livestock …           2 Afgh…           5313 Laying         1062
2 QL            Livestock …           2 Afgh…           5410 Yield          1062
3 QL            Livestock …           2 Afgh…           5510 Produc…        1062
4 QL            Livestock …           2 Afgh…           5313 Laying         1062
5 QL            Livestock …           2 Afgh…           5410 Yield          1062
6 QL            Livestock …           2 Afgh…           5510 Produc…        1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Description : The dataset ‘FAOSTAT_egg_chicken.csv’ contains information about the livestock data produced/consumed by countries during various timelines (number of years) ranging from 1961 to 2018.

Code

head(df)

# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QL            Livestock …           2 Afgh…           5313 Laying         1062
2 QL            Livestock …           2 Afgh…           5410 Yield          1062
3 QL            Livestock …           2 Afgh…           5510 Produc…        1062
4 QL            Livestock …           2 Afgh…           5313 Laying         1062
5 QL            Livestock …           2 Afgh…           5410 Yield          1062
6 QL            Livestock …           2 Afgh…           5510 Produc…        1062
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

Displaying the summary of the dataset.

Code

summary(df)

 Domain Code           Domain            Area Code          Area          
 Length:38170       Length:38170       Min.   :   1.0   Length:38170      
 Class :character   Class :character   1st Qu.:  70.0   Class :character  
 Mode  :character   Mode  :character   Median : 143.0   Mode  :character  
                                       Mean   : 771.1                     
                                       3rd Qu.: 215.0                     
                                       Max.   :5504.0                     
                                                                          
  Element Code    Element            Item Code        Item          
 Min.   :5313   Length:38170       Min.   :1062   Length:38170      
 1st Qu.:5313   Class :character   1st Qu.:1062   Class :character  
 Median :5410   Mode  :character   Median :1062   Mode  :character  
 Mean   :5411                      Mean   :1062                     
 3rd Qu.:5510                      3rd Qu.:1062                     
 Max.   :5510                      Max.   :1062                     
                                                                    
   Year Code         Year          Unit               Value         
 Min.   :1961   Min.   :1961   Length:38170       Min.   :       1  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:    2600  
 Median :1991   Median :1991   Mode  :character   Median :   31996  
 Mean   :1990   Mean   :1990                      Mean   :  291341  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   93836  
 Max.   :2018   Max.   :2018                      Max.   :76769955  
                                                  NA's   :40        
     Flag           Flag Description  
 Length:38170       Length:38170      
 Class :character   Class :character  
 Mode  :character   Mode  :character

Displaying the dimensions of the dataset,

Code

dim(df)

[1] 38170    14

Printing all columns

Code

colnames(df)

 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

Printing the unique Years

Code

unique_years <- df%>% select(Year)%>% n_distinct(.)
unique_years

[1] 58

The dataset contains info of 58 unique years i.e 1961-2018

Printing the unique areas

Code

unique_areas <- df%>% select(Area)%>% n_distinct(.)
unique_areas

[1] 245

The dataset contains info about 245 unique areas

Printing the unique Domains (eg; Primary Live Stock etc) The data is all from a single domain

Code

unique_areas <- df%>% select(Domain)%>% n_distinct(.)
unique_areas

[1] 1