library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 8
Challenge Overview
Today’s challenge is to:
- read in multiple data sets, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- join two or more data sets and analyze some aspect of the joined data
(be sure to only include the category tags for the data you use!)
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- faostat ⭐⭐
In the read data, I read all th fao data here. There would be four of data which call egg, livestock, cattle, country.
<-read_csv("_data/FAOSTAT_egg_chicken.csv",show_col_types = FALSE)
egghead(egg,10)
# A tibble: 10 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1961 1961
2 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1961 1961
3 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1961 1961
4 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1962 1962
5 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1962 1962
6 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1962 1962
7 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1963 1963
8 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1963 1963
9 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1963 1963
10 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1964 1964
# … with 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
# `Flag Description` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
dim(egg)
[1] 38170 14
<-read_csv("_data/FAOSTAT_livestock.csv",show_col_types = FALSE)
livestockhead(livestock,10)
# A tibble: 10 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1961 1961
2 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1962 1962
3 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1963 1963
4 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1964 1964
5 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1965 1965
6 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1966 1966
7 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1967 1967
8 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1968 1968
9 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1969 1969
10 QA Live … 2 Afgh… 5111 Stocks 1107 Asses 1970 1970
# … with 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
# `Flag Description` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
dim(livestock)
[1] 82116 14
<-read_csv("_data/FAOSTAT_cattle_dairy.csv",show_col_types = FALSE)
cattlehead(cattle,10)
# A tibble: 10 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962
7 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1963 1963
8 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1963 1963
9 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1963 1963
10 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1964 1964
# … with 4 more variables: Unit <chr>, Value <dbl>, Flag <chr>,
# `Flag Description` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
dim(cattle)
[1] 36449 14
Briefly describe the data
Tidy Data (as needed)
The data is extremely tidy data. Therefor, I would not use any command to make a clean data. As the above table, I could know 14 of the columns with same column name. Therefore, I could join data easily.
Join Data
Be sure to include a sanity check, and double-check that case count is correct!
The first joins cattle data and livestock. As I used the full_join command here, I got the 118565 rows. I want to prove my answer is correct so I have to sum the rows between those data. In the cattle data, there are 36449 rows. In the livestock data, there are 82116 rows. Therefore, I sum both of the data and got the 118565. This is how I check I am correct.
<-full_join(cattle,livestock)
cattle_livedim(cattle_live)
[1] 118565 14
head(cattle_live)
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961 Head
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961 hg/An
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961 tonn…
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962 Head
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962 hg/An
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
As the command, I want to double check whether my idea is correct or not. Therefore, I sum the rows of egg and cattle data. 36449 + 38170 = 74619.
<-full_join(egg,cattle)
egg_cattledim(egg_cattle)
[1] 74619 14
head(egg_cattle)
# A tibble: 6 × 14
Domai…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year Unit
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr>
1 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1961 1961 1000…
2 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1961 1961 100m…
3 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1961 1961 tonn…
4 QL Lives… 2 Afgh… 5313 Laying 1062 Eggs… 1962 1962 1000…
5 QL Lives… 2 Afgh… 5410 Yield 1062 Eggs… 1962 1962 100m…
6 QL Lives… 2 Afgh… 5510 Produc… 1062 Eggs… 1962 1962 tonn…
# … with 3 more variables: Value <dbl>, Flag <chr>, `Flag Description` <chr>,
# and abbreviated variable names ¹`Domain Code`, ²`Area Code`,
# ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`