challenge_1
FNU Avinesh Krishnan
wild_bird_data
Reading in data and creating a post
Author

FNU Avinesh Krishnan

Published

May 13, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
 library(readxl)
 wild_birds <- read_csv("~/Desktop/601_Spring_2023/posts/_data/FAOstat_egg_chicken.csv")
 view(wild_birds)

Read the data from the csv file.

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

The given data appears to be related to Livestock Primary in different areas and several years. It contains information about the production of eggs, laying, yield, and their respective values in different units. The data also includes flags and flag descriptions for each entry. The columns contain information such as Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description. The data appears to have been collected by the Food and Agriculture Organization (FAO).

Code
 head(wild_birds)
# A tibble: 6 × 14
  Domai…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year Unit 
  <chr>   <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl> <chr>
1 QL      Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1961  1961 1000…
2 QL      Lives…       2 Afgh…    5410 Yield      1062 Eggs…    1961  1961 100m…
3 QL      Lives…       2 Afgh…    5510 Produc…    1062 Eggs…    1961  1961 tonn…
4 QL      Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1962  1962 1000…
5 QL      Lives…       2 Afgh…    5410 Yield      1062 Eggs…    1962  1962 100m…
6 QL      Lives…       2 Afgh…    5510 Produc…    1062 Eggs…    1962  1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
#   and abbreviated variable names ¹​`Domain Code`, ²​`Area Code`,
#   ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`

Look at the dataset’s first six rows to get a sense of the type of data that is present.

Code
 dim(wild_birds)
[1] 38170    14

Got an idea of number of observations taken on different fields.

Code
 colnames(wild_birds)
 [1] "Domain Code"      "Domain"           "Area Code"        "Area"            
 [5] "Element Code"     "Element"          "Item Code"        "Item"            
 [9] "Year Code"        "Year"             "Unit"             "Value"           
[13] "Flag"             "Flag Description"

the dataset’s column names.

Code
 filter(wild_birds, `Area`=="Afghanistan" & `Element`=="Laying")
# A tibble: 58 × 14
   Domain Cod…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1961  1961
 2 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1962  1962
 3 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1963  1963
 4 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1964  1964
 5 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1965  1965
 6 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1966  1966
 7 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1967  1967
 8 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1968  1968
 9 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1969  1969
10 QL           Lives…       2 Afgh…    5313 Laying     1062 Eggs…    1970  1970
# … with 48 more rows, 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
#   `Flag Description` <chr>, and abbreviated variable names ¹​`Domain Code`,
#   ²​`Area Code`, ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`

Knowing the data of Afghanistan and the the Yield which is Laying.

Code
 wild_birds%>%
  select(`Area`) %>%
  n_distinct(.)
[1] 245

There are 245 distinct areas.

Code
 wild_birds%>%
  select(`Area`) %>%
  distinct(.)
# A tibble: 245 × 1
   Area               
   <chr>              
 1 Afghanistan        
 2 Albania            
 3 Algeria            
 4 American Samoa     
 5 Angola             
 6 Antigua and Barbuda
 7 Argentina          
 8 Armenia            
 9 Australia          
10 Austria            
# … with 235 more rows

The above ones are the distict area names.

Code
 count_flag<-wild_birds%>%
  filter(`Flag Description`=="Official data") %>% 
  count()
count_flag
# A tibble: 1 × 1
      n
  <int>
1  7548

There are 7548 data points where the data has a flag as official data.

Code
 summary(wild_birds)
 Domain Code           Domain            Area Code          Area          
 Length:38170       Length:38170       Min.   :   1.0   Length:38170      
 Class :character   Class :character   1st Qu.:  70.0   Class :character  
 Mode  :character   Mode  :character   Median : 143.0   Mode  :character  
                                       Mean   : 771.1                     
                                       3rd Qu.: 215.0                     
                                       Max.   :5504.0                     
                                                                          
  Element Code    Element            Item Code        Item          
 Min.   :5313   Length:38170       Min.   :1062   Length:38170      
 1st Qu.:5313   Class :character   1st Qu.:1062   Class :character  
 Median :5410   Mode  :character   Median :1062   Mode  :character  
 Mean   :5411                      Mean   :1062                     
 3rd Qu.:5510                      3rd Qu.:1062                     
 Max.   :5510                      Max.   :1062                     
                                                                    
   Year Code         Year          Unit               Value         
 Min.   :1961   Min.   :1961   Length:38170       Min.   :       1  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:    2600  
 Median :1991   Median :1991   Mode  :character   Median :   31996  
 Mean   :1990   Mean   :1990                      Mean   :  291341  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   93836  
 Max.   :2018   Max.   :2018                      Max.   :76769955  
                                                  NA's   :40        
     Flag           Flag Description  
 Length:38170       Length:38170      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      

A brief summary of the dataset.