challenge_2
Ishan Bhardwaj
birds
Summary statistics for birds dataset
Author

Ishan Bhardwaj

Published

May 20, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
  2. provide summary statistics for different interesting groups within the data, and interpret those statistics

Read in the Data

Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.

  • railroad*.csv or StateCounty2012.xls ⭐
  • FAOstat*.csv or birds.csv ⭐⭐⭐
  • hotel_bookings.csv ⭐⭐⭐⭐
Code
birds <- read_csv("_data/birds.csv")

Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Code
head(birds, 100)
# A tibble: 100 × 14
   `Domain Code` Domain     `Area Code` Area  `Element Code` Element `Item Code`
   <chr>         <chr>            <dbl> <chr>          <dbl> <chr>         <dbl>
 1 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 2 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 3 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 4 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 5 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 6 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 7 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 8 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 9 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
10 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
# ℹ 90 more rows
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>
Code
bird_type_table <- table(select(birds, "Item"))
bird_type_table
Item
              Chickens                  Ducks Geese and guinea fowls 
                 13074                   6909                   4136 
  Pigeons, other birds                Turkeys 
                  1165                   5693 
Code
stock_source_table <- table(select(birds, "Flag Description"))
stock_source_table
Flag Description
Aggregate, may include official, semi-official, estimated or calculated data 
                                                                        6488 
                                                          Data not available 
                                                                        1002 
                                    FAO data based on imputation methodology 
                                                                        1213 
                                                                FAO estimate 
                                                                       10007 
                                                               Official data 
                                                                       10773 
                                                           Unofficial figure 
                                                                        1494 

This dataset presents the “stocks” or count of live animals present in different areas from the year 1961 to 2018. The live animals analysed are turkeys, chickens, ducks, geese + guinea fowls, pigeons, and “other birds”. The unit in which the stocks are counted is 1000 head. This data was likely gathered by the FAO, or the Food and Agriculture Organization of the United Nations. This is because for the “Flag Description” variable, which describes the source of each observation in the dataset, the only organization listed as a source is the FAO.

Provide Grouped Summary Statistics

Conduct some exploratory data analysis, using dplyr commands such as group_by(), select(), filter(), and summarise(). Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.

Code
summary <- birds %>%
  select("Area", "Year", "Item", "Value", "Unit") %>%
  group_by(Item) %>%
  # Mean, median, and standard deviation of stocks of different animals
  summarise(mean_stocks = mean(Value, na.rm=TRUE),
            median_stocks = median(Value, na.rm=TRUE),
            std_stocks = sd(Value, na.rm=TRUE)) %>%
  arrange(desc(mean_stocks))
summary
# A tibble: 5 × 4
  Item                   mean_stocks median_stocks std_stocks
  <chr>                        <dbl>         <dbl>      <dbl>
1 Chickens                   207931.        10784.   1081629.
2 Ducks                       23072.          510     110621.
3 Turkeys                     15228.          528      56416.
4 Geese and guinea fowls      10292.          258      44489.
5 Pigeons, other birds         6163.         2800       8481.

This tabulates general summary statistics for the birds dataset.

Code
highest <- birds %>%
  select("Area", "Year", "Item", "Value", "Unit") %>%
  filter(`Area` == "World", `Item` == "Chickens") %>%
  arrange(desc(Value))
highest
# A tibble: 58 × 5
   Area   Year Item        Value Unit     
   <chr> <dbl> <chr>       <dbl> <chr>    
 1 World  2018 Chickens 23707134 1000 Head
 2 World  2017 Chickens 23212565 1000 Head
 3 World  2016 Chickens 22826754 1000 Head
 4 World  2015 Chickens 21678753 1000 Head
 5 World  2014 Chickens 21118803 1000 Head
 6 World  2013 Chickens 20953583 1000 Head
 7 World  2012 Chickens 20489756 1000 Head
 8 World  2010 Chickens 20244638 1000 Head
 9 World  2011 Chickens 19950281 1000 Head
10 World  2009 Chickens 19720796 1000 Head
# ℹ 48 more rows

From here, we see that the highest world count of chickens was in 2018. Furthermore, we see a relatively consistent decrease in the chicken stock as we go to each previous year, especially in the years of the 21st century. This could hint at an increasing yearly population, which increases the demand for chickens.

Explain and Interpret

Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.

The summary code chunk above groups the birds dataset by the five different bird categories. It then calculates the mean, median, and standard deviation of the stocks of these groups. After arranging them we see that in general, the number of chickens far outweighs the counts of other birds. However, this result comes with a lot of variance, which means that for the areas whose stocks are composed of chickens, some have either a very high or a very low stock. Ducks, turkeys, and geese + guinea fowls have stocks relatively within the same range, and pigeons have the lowest mean stock. This grouping arrangement is insightful because it gives us information on which types of birds may be cheaper to rear.