Challenge 1 Reading Birds

challenge_1
birds
Author

Kekai Liu

Published

February 21, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

The birds.csv file is read in using read_csv().

Code
birds <- read_csv("_data/birds.csv") #read in the data and assign it to birds

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

This dataset has 30,977 rows and 14 columns. The 30,977 rows represent 30,977 observations or cases, and the 14 columns represent 14 variables: Domain Code, Domain, Area Code, Area, Element Code, Element, Item Code, Item, Year Code, Year, Unit, Value, Flag, and Flag Description.

Code
str(birds) #produce a summary of the contents (dimensions, variables, variable types) of the data
spc_tbl_ [30,977 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Domain Code     : chr [1:30977] "QA" "QA" "QA" "QA" ...
 $ Domain          : chr [1:30977] "Live Animals" "Live Animals" "Live Animals" "Live Animals" ...
 $ Area Code       : num [1:30977] 2 2 2 2 2 2 2 2 2 2 ...
 $ Area            : chr [1:30977] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ Element Code    : num [1:30977] 5112 5112 5112 5112 5112 ...
 $ Element         : chr [1:30977] "Stocks" "Stocks" "Stocks" "Stocks" ...
 $ Item Code       : num [1:30977] 1057 1057 1057 1057 1057 ...
 $ Item            : chr [1:30977] "Chickens" "Chickens" "Chickens" "Chickens" ...
 $ Year Code       : num [1:30977] 1961 1962 1963 1964 1965 ...
 $ Year            : num [1:30977] 1961 1962 1963 1964 1965 ...
 $ Unit            : chr [1:30977] "1000 Head" "1000 Head" "1000 Head" "1000 Head" ...
 $ Value           : num [1:30977] 4700 4900 5000 5300 5500 5800 6600 6290 6300 6000 ...
 $ Flag            : chr [1:30977] "F" "F" "F" "F" ...
 $ Flag Description: chr [1:30977] "FAO estimate" "FAO estimate" "FAO estimate" "FAO estimate" ...
 - attr(*, "spec")=
  .. cols(
  ..   `Domain Code` = col_character(),
  ..   Domain = col_character(),
  ..   `Area Code` = col_double(),
  ..   Area = col_character(),
  ..   `Element Code` = col_double(),
  ..   Element = col_character(),
  ..   `Item Code` = col_double(),
  ..   Item = col_character(),
  ..   `Year Code` = col_double(),
  ..   Year = col_double(),
  ..   Unit = col_character(),
  ..   Value = col_double(),
  ..   Flag = col_character(),
  ..   `Flag Description` = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

The dataset covers 1961-2018; earlier years have less cases than recent years.

Code
table((select(birds, Year))) #retrieve Year column from birds, calculate frequencies
Year
1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 
 493  493  493  493  494  495  495  495  498  498  498  498  498  499  499  499 
1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 
 498  498  497  496  498  498  495  498  499  499  500  502  503  512  514  569 
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 
 574  574  574  574  574  574  574  575  575  575  575  575  575  576  576  576 
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 
 576  576  576  577  577  577  577  577  577  577 

The Flag Description provides information on the sources of data. 6,488 cases are aggregate data, 1,002 cases do not have data available, 1,213 cases are FAO imputed data, 10,007 cases are FAO estimates, 10,773 cases are from official data, and 1,494 cases are unofficial figures.

Code
table((select(birds, "Flag Description"))) #retrieve "Flag Description" column from birds, calculate frequencies
Flag Description
Aggregate, may include official, semi-official, estimated or calculated data 
                                                                        6488 
                                                          Data not available 
                                                                        1002 
                                    FAO data based on imputation methodology 
                                                                        1213 
                                                                FAO estimate 
                                                                       10007 
                                                               Official data 
                                                                       10773 
                                                           Unofficial figure 
                                                                        1494 

Of the 30,977 total cases, chickens comprised 13,074, ducks comprised 6,909, geese and guinea fowls comprised 4,136, pigeons and other birds comprised 1,165, and turkeys comprised 5,693.

Code
table((select(birds, Item))) #retrieve Item column from birds, calculate frequencies
Item
              Chickens                  Ducks Geese and guinea fowls 
                 13074                   6909                   4136 
  Pigeons, other birds                Turkeys 
                  1165                   5693 

The data only covers live animals.

Code
table((select(birds, Domain)))
Domain
Live Animals 
       30977 

These are the ten areas with the most number of cases. This output shows that the data includes supranational cases: Africa, Asia, Eastern Asia, Europe). There are several areas with the most number of cases overall, 290.

Code
head(sort(table((select(birds, Area))),decreasing=TRUE), n=10) #retrieve Area column from birds, calculate frequencies, sort in ascending order, display first ten
Area
            Africa               Asia       Eastern Asia              Egypt 
               290                290                290                290 
            Europe             France             Greece            Myanmar 
               290                290                290                290 
   Northern Africa South-eastern Asia 
               290                290 

These are the ten areas with the least number of cases. South Sudan and Sudan jointly have the least number of cases overall with only seven.

Code
head(sort(table((select(birds, Area))),decreasing=FALSE), n=10) #retrieve Area column from birds, calculate frequencies, sort in descending order, display first ten
Area
    South Sudan           Sudan      Montenegro      Luxembourg         Eritrea 
              7               7              13              19              26 
       Ethiopia North Macedonia      Tajikistan           Aruba    Ethiopia PDR 
             26              27              27              29              32 

The data contains cases from 248 unique areas.

Code
nrow(unique(select(birds, Area))) #retrieve Area column from birds, identify unique Area values, calculate total number of unique Area values
[1] 248

Here is the five summary of the data. The smallest stock value is 0 units of 1000 Head, and the largest stock value is 23,707,134 units of 1000 Head. The mean or average across all cases is 99,411 units of 1000 Head.

Code
summary(select(birds, Value)) #retrieve Value column from birds, produce five number summary of Value
     Value         
 Min.   :       0  
 1st Qu.:     171  
 Median :    1800  
 Mean   :   99411  
 3rd Qu.:   15404  
 Max.   :23707134  
 NA's   :1036      

There are 1,036 cases with missing stock values.

Code
sum(is.na(select(birds, Value))) #retrieve Value column from birds, identify cases with missing values, total the number of cases with missing values
[1] 1036

From this quick analysis, we can summarize this as a dataset of selected types of live bird stock measured in units of 1000 Head in 248 defined areas of the world in a calendar year. A case corresponds to the live stock of a type of bird in an area of the world in a calendar year.