Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Jack Sniezek
November 29, 2022
Today’s challenge is to
Read in one (or more) of the following data sets, available in the posts/_data
folder, using the correct R package and command.
# A tibble: 30,977 × 6
Area Item Year Value Flag `Flag Description`
<chr> <chr> <dbl> <dbl> <chr> <chr>
1 Afghanistan Chickens 1961 4700 F FAO estimate
2 Afghanistan Chickens 1962 4900 F FAO estimate
3 Afghanistan Chickens 1963 5000 F FAO estimate
4 Afghanistan Chickens 1964 5300 F FAO estimate
5 Afghanistan Chickens 1965 5500 F FAO estimate
6 Afghanistan Chickens 1966 5800 F FAO estimate
7 Afghanistan Chickens 1967 6600 F FAO estimate
8 Afghanistan Chickens 1968 6290 <NA> Official data
9 Afghanistan Chickens 1969 6300 F FAO estimate
10 Afghanistan Chickens 1970 6000 F FAO estimate
# … with 30,967 more rows
Area Item Year Value
Length:30977 Length:30977 Min. :1961 Min. : 0
Class :character Class :character 1st Qu.:1976 1st Qu.: 171
Mode :character Mode :character Median :1992 Median : 1800
Mean :1991 Mean : 99411
3rd Qu.:2005 3rd Qu.: 15404
Max. :2018 Max. :23707134
NA's :1036
Flag Flag Description
Length:30977 Length:30977
Class :character Class :character
Mode :character Mode :character
The birds dataset contained 14 variables, 8 of which are character variables and 6 are numeric variables. It was collected by the Food and Agriculture Association of the United Nations. This dataset features estimates of five types of bird(Chickens, Ducks, Geese and fowls, Turkeys, and Pigeons/Other birds) in 248 regions. The data was collected from 1961-2018.
Reading in the data, I chose to omit Element, Domain, and Unit as they are the same for every data point. I also eliminated all of the “Code” variables, as they are either redundant, or not useful to work with.
# A tibble: 248 × 1
Area
<chr>
1 Afghanistan
2 Albania
3 Algeria
4 American Samoa
5 Angola
6 Antigua and Barbuda
7 Argentina
8 Armenia
9 Aruba
10 Australia
# … with 238 more rows
# A tibble: 5 × 1
Item
<chr>
1 Chickens
2 Ducks
3 Geese and guinea fowls
4 Turkeys
5 Pigeons, other birds
I started my analysis of the birds dataset by taking a look at the average and median stock values by year.
# A tibble: 58 × 3
Year avg_stocks med_stocks
<dbl> <dbl> <dbl>
1 1961 36752. 1033
2 1962 37787. 1014
3 1963 38736. 1106
4 1964 39325. 1103
5 1965 40334. 1104
6 1966 41229. 1088.
7 1967 43240. 1193
8 1968 44420. 1252.
9 1969 45607. 1267
10 1970 47706. 1259
# … with 48 more rows
While this was helpful in showing a general trend for the data over the 58 years, it was very basic. The next step I took was to show the average of each Item(type of bird) across each year. I dropped the median because I felt focusing on average would provide more information.
# A tibble: 5 × 59
# Groups: Item [5]
Item `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969` `1970`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Chickens 74060. 76753. 78922. 80213. 82458. 83880. 88047. 91003. 94121. 98297.
2 Ducks 7232. 7520. 7861. 8082. 8329. 8592. 8861. 9166. 9257. 9493.
3 Geese a… 2364. 2435. 2483. 2641. 2808. 2870. 3099. 3245. 3331. 3465.
4 Pigeons… 3307. 3771. 4004. 4227. 4440. 4630. 4673. 2840. 2978. 3110.
5 Turkeys 10610. 9043 8377. 7987. 7938. 8546. 8931. 7959. 7998. 9062.
# … with 48 more variables: `1971` <dbl>, `1972` <dbl>, `1973` <dbl>,
# `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>,
# `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
# `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
# `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
# `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
# `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, …
Finally, I wanted to try to focus on a singular Area for the table, so naturally I chose to filter the Area by ‘Americas’ which had some of the largest numbers and is ugly to look at in the rendering. However, it was a very complete data point to focus on so it works out.
# A tibble: 4 × 59
# Groups: Item [4]
Item `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969` `1970`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Chickens 1.19e6 1.22e6 1.24e6 1.29e6 1.33e6 1.37e6 1.43e6 1.47e6 1.53e6 1.56e6
2 Ducks 9.64e3 9.99e3 1.07e4 1.10e4 1.13e4 1.19e4 1.18e4 1.20e4 1.20e4 1.21e4
3 Geese a… 5.53e2 5.61e2 5.95e2 6.07e2 6.18e2 6.43e2 5.95e2 6.23e2 6.59e2 6.65e2
4 Turkeys 1.19e5 1.03e5 1.05e5 1.13e5 1.18e5 1.30e5 1.39e5 1.20e5 1.20e5 1.31e5
# … with 48 more variables: `1971` <dbl>, `1972` <dbl>, `1973` <dbl>,
# `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>,
# `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
# `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
# `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
# `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>,
# `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>, `2003` <dbl>, …
Taking a look at my initial analysis of the average stock values by year, I can see that the stock values increase over time. When I divided the stock values by bird type, I could see that Chickens, Ducks, and Geese have increased steadily almost every year until plateauing in the 2010s. Pigeons peaked in the 1990s and then have leveled out ever since. Turkeys have been hovering around the same since 1980. When I further narrowed down to just the Americas, I noticed that there are no pigeons. Chickens grew steadily each year. Ducks and Turkeys plateaued around 1990. Geese experienced a peak in 1988-1989, and then dropped significantly, and then leveled off.
---
title: "Challenge 2"
author: "Jack Sniezek"
desription: "Data wrangling: using group() and summarise()"
date: "11/29/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
- railroads
- faostat
- hotel_bookings
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a data set, and describe the data using both words and any supporting information (e.g., tables, etc)
2) provide summary statistics for different interesting groups within the data, and interpret those statistics
## Read in the Data
Read in one (or more) of the following data sets, available in the `posts/_data` folder, using the correct R package and command.
- railroad\*.csv or StateCounty2012.xls ⭐
- FAOstat\*.csv or birds.csv ⭐⭐⭐
- hotel_bookings.csv ⭐⭐⭐⭐
```{r}
birds <- read_csv("_data/birds.csv")%>%
select(-c(contains("Code"), Element, Domain, Unit))
birds
summary(birds)
```
## Describe the data
The birds dataset contained 14 variables, 8 of which are character variables and 6 are numeric variables. It was collected by the Food and Agriculture Association of the United Nations. This dataset features estimates of five types of bird(Chickens, Ducks, Geese and fowls, Turkeys, and Pigeons/Other birds) in 248 regions. The data was collected from 1961-2018.
Reading in the data, I chose to omit Element, Domain, and Unit as they are the same for every data point. I also eliminated all of the "Code" variables, as they are either redundant, or not useful to work with.
```{r}
#| label: Showing how I found unique data
Area <- select(birds,"Area")
num_areas <- unique(Area)
num_areas
Item <- select(birds,"Item")
num_items <- unique(Item)
num_items
```
## Provide Grouped Summary Statistics
I started my analysis of the birds dataset by taking a look at the average and median stock values by year.
```{r}
birds%>%
group_by(Year)%>%
summarise(avg_stocks = mean(Value, na.rm=TRUE),
med_stocks = median(Value, na.rm=TRUE))
```
While this was helpful in showing a general trend for the data over the 58 years, it was very basic. The next step I took was to show the average of each Item(type of bird) across each year. I dropped the median because I felt focusing on average would provide more information.
```{r}
t1<-birds%>%
group_by(Item,Year)%>%
summarise(avg_stocks = mean(Value, na.rm=TRUE))%>%
pivot_wider(names_from = Year, values_from = (avg_stocks))
t1
```
Finally, I wanted to try to focus on a singular Area for the table, so naturally I chose to filter the Area by 'Americas' which had some of the largest numbers and is ugly to look at in the rendering. However, it was a very complete data point to focus on so it works out.
```{r}
t2<-birds%>%
filter(Area == "Americas")%>%
group_by(Item,Year)%>%
summarise(avg_stocks = mean(Value, na.rm=TRUE))%>%
pivot_wider(names_from = Year, values_from = (avg_stocks))
t2
```
### Explain and Interpret
Taking a look at my initial analysis of the average stock values by year, I can see that the stock values increase over time. When I divided the stock values by bird type, I could see that Chickens, Ducks, and Geese have increased steadily almost every year until plateauing in the 2010s. Pigeons peaked in the 1990s and then have leveled out ever since. Turkeys have been hovering around the same since 1980. When I further narrowed down to just the Americas, I noticed that there are no pigeons. Chickens grew steadily each year. Ducks and Turkeys plateaued around 1990. Geese experienced a peak in 1988-1989, and then dropped significantly, and then leveled off.