Code
library(tidyverse)
library(dplyr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Matthew O’Neill
October 5, 2022
Today’s challenge is to
Read in one (or more) of the following data sets, available in the posts/_data
folder, using the correct R package and command.
Below is the first few rows of the dataset. It is very similar to the FAO dairy dataset, which I looked through in Challenge 1. At first glance, thess data appear to be a lot simpler than the dairy data, as we may just be dealing with the count of chickens accross fars in different Countries over time. But there may be more bird types that we don’t see right away. The data was once again likely gather by the Food and Agriculture Organization in the US.
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1961 1961 1000…
2 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1962 1962 1000…
3 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1963 1963 1000…
4 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1964 1964 1000…
5 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1965 1965 1000…
6 QA Live … 2 Afgh… 5112 Stocks 1057 Chic… 1966 1966 1000…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
Below is a breakdown of item types in our dataset, and it is clear we are workign with more than just Chickens. The data also includes Turkeys, Ducks, Geese, and Pigeons. It’s likely that different countries will produce differnt proportions of each.
Below we can see two tables, one showing the mean number of each kind of bird in each area. The second sorts these numbers in order to show that the largest count of birds in any one country is Chickens with 1.5 Million in total.
# A tibble: 601 × 3
# Groups: Area [248]
Area Item `mean(Value)`
<chr> <chr> <dbl>
1 Afghanistan Chickens 8099.
2 Africa Chickens 936779.
3 Africa Ducks 13639.
4 Africa Geese and guinea fowls 12164.
5 Africa Pigeons, other birds 11222.
6 Africa Turkeys 9004.
7 Albania Chickens 4055.
8 Albania Ducks 173.
9 Albania Geese and guinea fowls 123
10 Albania Turkeys 323.
# … with 591 more rows
# A tibble: 601 × 3
# Groups: Area [248]
Area Item `mean(Value)`
<chr> <chr> <dbl>
1 World Chickens 11624407.
2 Asia Chickens 5498104.
3 Americas Chickens 3163543.
4 Eastern Asia Chickens 2843125.
5 China, mainland Chickens 2417047.
6 Europe Chickens 1945862.
7 Northern America Chickens 1518820.
8 United States of America Chickens 1398810.
9 South-eastern Asia Chickens 1261959.
10 South America Chickens 1150118.
# … with 591 more rows
Seeing as the United States seems to be one of the largest producers of birds for meat, let’s take a closer look at their production.
# A tibble: 3 × 6
Item mean median std max min
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Chickens 1398810. 1305000 463816. 2015000 751000
2 Ducks 5683. 6225 1485. 7900 3400
3 Turkeys 206418. 240219 69420. 302713 91837
It appears that the US produces mostly Chicken, with a mean and median production over the years of around 1.3 million chickens. They produced far fewer Turkeys over the same time period, with an average of 206K per year, and barely any Ducks in comparison to the previous two subgroups. We also see that the US did not produce any Geese or Pigeons as food.
Comparing maximum and minimum values over time, it’s interesting to point out that the best year for Turkey production was still less than half that of the worst year for Chicken production. This makes some sense knowing US culture, as Turkey is a much more seasonal meat than Chicken.
Finally it’s important to point out that there was quite a bit of variance in Turkey production, with a standard deviation of nearly 70K per year despite it’s mean.
---
title: "Challenge 2"
author: "Matthew O'Neill"
desription: "Data wrangling: using group() and summarise()"
date: "10/05/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
- railroads
- faostat
- hotel_bookings
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2) provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data
Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.
- railroad\*.csv or StateCounty2012.xls ⭐
- FAOstat\*.csv or birds.csv ⭐⭐⭐
- hotel_bookings.csv ⭐⭐⭐⭐
```{r}
data <- read_csv("../posts/_data/birds.csv")
```
## Describe the data
Below is the first few rows of the dataset. It is very similar to the FAO dairy dataset, which I looked through in Challenge 1. At first glance, thess data appear to be a lot simpler than the dairy data, as we may just be dealing with the count of chickens accross fars in different Countries over time. But there may be more bird types that we don't see right away. The data was once again likely gather by the Food and Agriculture Organization in the US.
```{r}
#| label: summary
head(data)
```
Below is a breakdown of item types in our dataset, and it is clear we are workign with more than just Chickens. The data also includes Turkeys, Ducks, Geese, and Pigeons. It's likely that different countries will produce differnt proportions of each.
```{r}
#| label: Item
item <- select(data, "Item")
table(item)
```
## Grouped Summary Statistics
Below we can see two tables, one showing the mean number of each kind of bird in each area. The second sorts these numbers in order to show that the largest count of birds in any one country is Chickens with 1.5 Million in total.
```{r}
bird_type <- data %>%
group_by(`Area`,`Item`) %>%
replace_na(list(`Value` = 0))%>%
summarise(mean(`Value`))
bird_type
bird_type %>%
arrange(desc(`mean(Value)`))
```
Seeing as the United States seems to be one of the largest producers of birds for meat, let's take a closer look at their production.
```{r}
data %>%
filter(`Area`=="United States of America")%>%
select(`Area`,`Item`,`Year Code`,`Value`)%>%
group_by(`Item`)%>%
summarize(mean = mean(`Value`),median = median(`Value`), std = sd(`Value`), max = max(`Value`), min = min(`Value`))
```
### Explain and Interpret
It appears that the US produces mostly Chicken, with a mean and median production over the years of around 1.3 million chickens. They produced far fewer Turkeys over the same time period, with an average of 206K per year, and barely any Ducks in comparison to the previous two subgroups. We also see that the US did not produce any Geese or Pigeons as food.
Comparing maximum and minimum values over time, it's interesting to point out that the best year for Turkey production was still less than half that of the worst year for Chicken production. This makes some sense knowing US culture, as Turkey is a much more seasonal meat than Chicken.
Finally it's important to point out that there was quite a bit of variance in Turkey production, with a standard deviation of nearly 70K per year despite it's mean.