Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Ishan Bhardwaj
May 20, 2023
Today’s challenge is to
Read in one (or more) of the following data sets, available in the posts/_data folder, using the correct R package and command.
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 100 × 14
   `Domain Code` Domain     `Area Code` Area  `Element Code` Element `Item Code`
   <chr>         <chr>            <dbl> <chr>          <dbl> <chr>         <dbl>
 1 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 2 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 3 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 4 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 5 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 6 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 7 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 8 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
 9 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
10 QA            Live Anim…           2 Afgh…           5112 Stocks         1057
# ℹ 90 more rows
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>Item
              Chickens                  Ducks Geese and guinea fowls 
                 13074                   6909                   4136 
  Pigeons, other birds                Turkeys 
                  1165                   5693 Flag Description
Aggregate, may include official, semi-official, estimated or calculated data 
                                                                        6488 
                                                          Data not available 
                                                                        1002 
                                    FAO data based on imputation methodology 
                                                                        1213 
                                                                FAO estimate 
                                                                       10007 
                                                               Official data 
                                                                       10773 
                                                           Unofficial figure 
                                                                        1494 This dataset presents the “stocks” or count of live animals present in different areas from the year 1961 to 2018. The live animals analysed are turkeys, chickens, ducks, geese + guinea fowls, pigeons, and “other birds”. The unit in which the stocks are counted is 1000 head. This data was likely gathered by the FAO, or the Food and Agriculture Organization of the United Nations. This is because for the “Flag Description” variable, which describes the source of each observation in the dataset, the only organization listed as a source is the FAO.
Conduct some exploratory data analysis, using dplyr commands such as group_by(), select(), filter(), and summarise(). Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
summary <- birds %>%
  select("Area", "Year", "Item", "Value", "Unit") %>%
  group_by(Item) %>%
  # Mean, median, and standard deviation of stocks of different animals
  summarise(mean_stocks = mean(Value, na.rm=TRUE),
            median_stocks = median(Value, na.rm=TRUE),
            std_stocks = sd(Value, na.rm=TRUE)) %>%
  arrange(desc(mean_stocks))
summary# A tibble: 5 × 4
  Item                   mean_stocks median_stocks std_stocks
  <chr>                        <dbl>         <dbl>      <dbl>
1 Chickens                   207931.        10784.   1081629.
2 Ducks                       23072.          510     110621.
3 Turkeys                     15228.          528      56416.
4 Geese and guinea fowls      10292.          258      44489.
5 Pigeons, other birds         6163.         2800       8481.This tabulates general summary statistics for the birds dataset.
# A tibble: 58 × 5
   Area   Year Item        Value Unit     
   <chr> <dbl> <chr>       <dbl> <chr>    
 1 World  2018 Chickens 23707134 1000 Head
 2 World  2017 Chickens 23212565 1000 Head
 3 World  2016 Chickens 22826754 1000 Head
 4 World  2015 Chickens 21678753 1000 Head
 5 World  2014 Chickens 21118803 1000 Head
 6 World  2013 Chickens 20953583 1000 Head
 7 World  2012 Chickens 20489756 1000 Head
 8 World  2010 Chickens 20244638 1000 Head
 9 World  2011 Chickens 19950281 1000 Head
10 World  2009 Chickens 19720796 1000 Head
# ℹ 48 more rowsFrom here, we see that the highest world count of chickens was in 2018. Furthermore, we see a relatively consistent decrease in the chicken stock as we go to each previous year, especially in the years of the 21st century. This could hint at an increasing yearly population, which increases the demand for chickens.
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.
The summary code chunk above groups the birds dataset by the five different bird categories. It then calculates the mean, median, and standard deviation of the stocks of these groups. After arranging them we see that in general, the number of chickens far outweighs the counts of other birds. However, this result comes with a lot of variance, which means that for the areas whose stocks are composed of chickens, some have either a very high or a very low stock. Ducks, turkeys, and geese + guinea fowls have stocks relatively within the same range, and pigeons have the lowest mean stock. This grouping arrangement is insightful because it gives us information on which types of birds may be cheaper to rear.
---
title: "Challenge 2"
author: "Ishan Bhardwaj"
description: "Summary statistics for birds dataset"
date: "05/20/2023"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_2
  - Ishan Bhardwaj
  - birds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1)  read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2)  provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data
Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.
-   railroad\*.csv or StateCounty2012.xls ⭐
-   FAOstat\*.csv or birds.csv ⭐⭐⭐
-   hotel_bookings.csv ⭐⭐⭐⭐
```{r}
birds <- read_csv("_data/birds.csv")
```
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
#| label: summary
head(birds, 100)
bird_type_table <- table(select(birds, "Item"))
bird_type_table
stock_source_table <- table(select(birds, "Flag Description"))
stock_source_table
```
This dataset presents the "stocks" or count of live animals present in different areas from the year 1961 to 2018. The live animals analysed are turkeys, chickens, ducks, geese + guinea fowls, pigeons, and "other birds". The unit in which the stocks are counted is 1000 head. This data was likely gathered by the FAO, or the Food and Agriculture Organization of the United Nations. This is because for the "Flag Description" variable, which describes the source of each observation in the dataset, the only *organization* listed as a source is the FAO.
## Provide Grouped Summary Statistics
Conduct some exploratory data analysis, using dplyr commands such as `group_by()`, `select()`, `filter()`, and `summarise()`. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
```{r}
summary <- birds %>%
  select("Area", "Year", "Item", "Value", "Unit") %>%
  group_by(Item) %>%
  # Mean, median, and standard deviation of stocks of different animals
  summarise(mean_stocks = mean(Value, na.rm=TRUE),
            median_stocks = median(Value, na.rm=TRUE),
            std_stocks = sd(Value, na.rm=TRUE)) %>%
  arrange(desc(mean_stocks))
summary
```
This tabulates general summary statistics for the birds dataset.
```{r}
highest <- birds %>%
  select("Area", "Year", "Item", "Value", "Unit") %>%
  filter(`Area` == "World", `Item` == "Chickens") %>%
  arrange(desc(Value))
highest
```
From here, we see that the highest world count of chickens was in 2018. Furthermore, we see a relatively consistent decrease in the chicken stock as we go to each previous year, especially in the years of the 21st century. This could hint at an increasing yearly population, which increases the demand for chickens.
### Explain and Interpret
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.
The summary code chunk above groups the birds dataset by the five different bird categories. It then calculates the mean, median, and standard deviation of the stocks of these groups. After arranging them we see that in general, the number of chickens far outweighs the counts of other birds. However, this result comes with a lot of variance, which means that for the areas whose stocks are composed of chickens, some have either a very high or a very low stock. Ducks, turkeys, and geese + guinea fowls have stocks relatively within the same range, and pigeons have the lowest mean stock. This grouping arrangement is insightful because it gives us information on which types of birds may be cheaper to rear.