Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Ishan Bhardwaj
May 20, 2023
Today’s challenge is to
Read in one (or more) of the following data sets, available in the posts/_data
folder, using the correct R package and command.
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 100 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QA Live Anim… 2 Afgh… 5112 Stocks 1057
2 QA Live Anim… 2 Afgh… 5112 Stocks 1057
3 QA Live Anim… 2 Afgh… 5112 Stocks 1057
4 QA Live Anim… 2 Afgh… 5112 Stocks 1057
5 QA Live Anim… 2 Afgh… 5112 Stocks 1057
6 QA Live Anim… 2 Afgh… 5112 Stocks 1057
7 QA Live Anim… 2 Afgh… 5112 Stocks 1057
8 QA Live Anim… 2 Afgh… 5112 Stocks 1057
9 QA Live Anim… 2 Afgh… 5112 Stocks 1057
10 QA Live Anim… 2 Afgh… 5112 Stocks 1057
# ℹ 90 more rows
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
Item
Chickens Ducks Geese and guinea fowls
13074 6909 4136
Pigeons, other birds Turkeys
1165 5693
Flag Description
Aggregate, may include official, semi-official, estimated or calculated data
6488
Data not available
1002
FAO data based on imputation methodology
1213
FAO estimate
10007
Official data
10773
Unofficial figure
1494
This dataset presents the “stocks” or count of live animals present in different areas from the year 1961 to 2018. The live animals analysed are turkeys, chickens, ducks, geese + guinea fowls, pigeons, and “other birds”. The unit in which the stocks are counted is 1000 head. This data was likely gathered by the FAO, or the Food and Agriculture Organization of the United Nations. This is because for the “Flag Description” variable, which describes the source of each observation in the dataset, the only organization listed as a source is the FAO.
Conduct some exploratory data analysis, using dplyr commands such as group_by()
, select()
, filter()
, and summarise()
. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
summary <- birds %>%
select("Area", "Year", "Item", "Value", "Unit") %>%
group_by(Item) %>%
# Mean, median, and standard deviation of stocks of different animals
summarise(mean_stocks = mean(Value, na.rm=TRUE),
median_stocks = median(Value, na.rm=TRUE),
std_stocks = sd(Value, na.rm=TRUE)) %>%
arrange(desc(mean_stocks))
summary
# A tibble: 5 × 4
Item mean_stocks median_stocks std_stocks
<chr> <dbl> <dbl> <dbl>
1 Chickens 207931. 10784. 1081629.
2 Ducks 23072. 510 110621.
3 Turkeys 15228. 528 56416.
4 Geese and guinea fowls 10292. 258 44489.
5 Pigeons, other birds 6163. 2800 8481.
This tabulates general summary statistics for the birds dataset.
# A tibble: 58 × 5
Area Year Item Value Unit
<chr> <dbl> <chr> <dbl> <chr>
1 World 2018 Chickens 23707134 1000 Head
2 World 2017 Chickens 23212565 1000 Head
3 World 2016 Chickens 22826754 1000 Head
4 World 2015 Chickens 21678753 1000 Head
5 World 2014 Chickens 21118803 1000 Head
6 World 2013 Chickens 20953583 1000 Head
7 World 2012 Chickens 20489756 1000 Head
8 World 2010 Chickens 20244638 1000 Head
9 World 2011 Chickens 19950281 1000 Head
10 World 2009 Chickens 19720796 1000 Head
# ℹ 48 more rows
From here, we see that the highest world count of chickens was in 2018. Furthermore, we see a relatively consistent decrease in the chicken stock as we go to each previous year, especially in the years of the 21st century. This could hint at an increasing yearly population, which increases the demand for chickens.
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.
The summary code chunk above groups the birds dataset by the five different bird categories. It then calculates the mean, median, and standard deviation of the stocks of these groups. After arranging them we see that in general, the number of chickens far outweighs the counts of other birds. However, this result comes with a lot of variance, which means that for the areas whose stocks are composed of chickens, some have either a very high or a very low stock. Ducks, turkeys, and geese + guinea fowls have stocks relatively within the same range, and pigeons have the lowest mean stock. This grouping arrangement is insightful because it gives us information on which types of birds may be cheaper to rear.
---
title: "Challenge 2"
author: "Ishan Bhardwaj"
description: "Summary statistics for birds dataset"
date: "05/20/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
- Ishan Bhardwaj
- birds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2) provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data
Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.
- railroad\*.csv or StateCounty2012.xls ⭐
- FAOstat\*.csv or birds.csv ⭐⭐⭐
- hotel_bookings.csv ⭐⭐⭐⭐
```{r}
birds <- read_csv("_data/birds.csv")
```
Add any comments or documentation as needed. More challenging data may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
#| label: summary
head(birds, 100)
bird_type_table <- table(select(birds, "Item"))
bird_type_table
stock_source_table <- table(select(birds, "Flag Description"))
stock_source_table
```
This dataset presents the "stocks" or count of live animals present in different areas from the year 1961 to 2018. The live animals analysed are turkeys, chickens, ducks, geese + guinea fowls, pigeons, and "other birds". The unit in which the stocks are counted is 1000 head. This data was likely gathered by the FAO, or the Food and Agriculture Organization of the United Nations. This is because for the "Flag Description" variable, which describes the source of each observation in the dataset, the only *organization* listed as a source is the FAO.
## Provide Grouped Summary Statistics
Conduct some exploratory data analysis, using dplyr commands such as `group_by()`, `select()`, `filter()`, and `summarise()`. Find the central tendency (mean, median, mode) and dispersion (standard deviation, mix/max/quantile) for different subgroups within the data set.
```{r}
summary <- birds %>%
select("Area", "Year", "Item", "Value", "Unit") %>%
group_by(Item) %>%
# Mean, median, and standard deviation of stocks of different animals
summarise(mean_stocks = mean(Value, na.rm=TRUE),
median_stocks = median(Value, na.rm=TRUE),
std_stocks = sd(Value, na.rm=TRUE)) %>%
arrange(desc(mean_stocks))
summary
```
This tabulates general summary statistics for the birds dataset.
```{r}
highest <- birds %>%
select("Area", "Year", "Item", "Value", "Unit") %>%
filter(`Area` == "World", `Item` == "Chickens") %>%
arrange(desc(Value))
highest
```
From here, we see that the highest world count of chickens was in 2018. Furthermore, we see a relatively consistent decrease in the chicken stock as we go to each previous year, especially in the years of the 21st century. This could hint at an increasing yearly population, which increases the demand for chickens.
### Explain and Interpret
Be sure to explain why you choose a specific group. Comment on the interpretation of any interesting differences between groups that you uncover. This section can be integrated with the exploratory data analysis, just be sure it is included.
The summary code chunk above groups the birds dataset by the five different bird categories. It then calculates the mean, median, and standard deviation of the stocks of these groups. After arranging them we see that in general, the number of chickens far outweighs the counts of other birds. However, this result comes with a lot of variance, which means that for the areas whose stocks are composed of chickens, some have either a very high or a very low stock. Ducks, turkeys, and geese + guinea fowls have stocks relatively within the same range, and pigeons have the lowest mean stock. This grouping arrangement is insightful because it gives us information on which types of birds may be cheaper to rear.