library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 8
Challenge Overview
Today’s challenge is to:
- read in multiple data sets, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- join two or more data sets and analyze some aspect of the joined data
(be sure to only include the category tags for the data you use!)
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- military marriages ⭐⭐
- faostat ⭐⭐
- railroads ⭐⭐⭐
- fed_rate ⭐⭐⭐
- debt ⭐⭐⭐
- us_hh ⭐⭐⭐⭐
- snl ⭐⭐⭐⭐⭐
<- read_csv("_data/FAOSTAT_country_groups.csv")
country_faostatcsv <- read_csv("_data/FAOSTAT_cattle_dairy.csv") cattle_faostatcsv
Briefly describe the data
country_faostatcsv
# A tibble: 1,943 × 7
`Country Group Code` `Country Group` Countr…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
<dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
1 5100 Africa 4 Algeria 012 DZ DZA
2 5100 Africa 7 Angola 024 AO AGO
3 5100 Africa 53 Benin 204 BJ BEN
4 5100 Africa 20 Botswa… 072 BW BWA
5 5100 Africa 233 Burkin… 854 BF BFA
6 5100 Africa 29 Burundi 108 BI BDI
7 5100 Africa 35 Cabo V… 132 CV CPV
8 5100 Africa 32 Camero… 120 CM CMR
9 5100 Africa 37 Centra… 140 CF CAF
10 5100 Africa 39 Chad 148 TD TCD
# … with 1,933 more rows, and abbreviated variable names ¹`Country Code`,
# ²`M49 Code`, ³`ISO2 Code`, ⁴`ISO3 Code`
summary(country_faostatcsv)
Country Group Code Country Group Country Code Country
Min. : 336 Length:1943 Min. : 1.0 Length:1943
1st Qu.:5200 Class :character 1st Qu.: 69.0 Class :character
Median :5403 Mode :character Median :136.0 Mode :character
Mean :5502 Mean :142.2
3rd Qu.:5848 3rd Qu.:202.0
Max. :9011 Max. :622.0
M49 Code ISO2 Code ISO3 Code
Length:1943 Length:1943 Length:1943
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
cattle_faostatcsv
# A tibble: 36,449 × 14
Domain Cod…¹ Domain Area …² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962
7 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1963 1963
8 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1963 1963
9 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1963 1963
10 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1964 1964
# … with 36,439 more rows, 4 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
# ¹`Domain Code`, ²`Area Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
summary(cattle_faostatcsv)
Domain Code Domain Area Code Area
Length:36449 Length:36449 Min. : 1.0 Length:36449
Class :character Class :character 1st Qu.: 69.0 Class :character
Mode :character Mode :character Median : 141.0 Mode :character
Mean : 775.2
3rd Qu.: 215.0
Max. :5504.0
Element Code Element Item Code Item
Min. :5318 Length:36449 Min. :882 Length:36449
1st Qu.:5318 Class :character 1st Qu.:882 Class :character
Median :5420 Mode :character Median :882 Mode :character
Mean :5416 Mean :882
3rd Qu.:5510 3rd Qu.:882
Max. :5510 Max. :882
Year Code Year Unit Value
Min. :1961 Min. :1961 Length:36449 Min. : 7
1st Qu.:1976 1st Qu.:1976 Class :character 1st Qu.: 7849
Median :1991 Median :1991 Mode :character Median : 43266
Mean :1990 Mean :1990 Mean : 4410235
3rd Qu.:2005 3rd Qu.:2005 3rd Qu.: 700000
Max. :2018 Max. :2018 Max. :683217055
NA's :74
Flag Flag Description
Length:36449 Length:36449
Class :character Class :character
Mode :character Mode :character
I have chosen country faostat and cattle faostat data set. After reviewing the data set it explains that country faostat dataset is codebook for country. This dataset has all the information country level. After joining this dataset with cattle faostat dataset we can perform data analysis based on different groups.
In cattle dataset there is a lot of information about agriculture across the world. As you can see from the summary in element code there is information about milk, yield and production. Also, there is a common column in both the datasets country code and area code. We will be using this to join the data.
Tidy Data (as needed)
<- rename (cattle_faostatcsv, "Country Code"= "Area Code" )
cattle_new
cattle_new
# A tibble: 36,449 × 14
Domain Cod…¹ Domain Count…² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
2 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1961 1961
3 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1961 1961
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1962 1962
5 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1962 1962
6 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1962 1962
7 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1963 1963
8 QL Lives… 2 Afgh… 5420 Yield 882 Milk… 1963 1963
9 QL Lives… 2 Afgh… 5510 Produc… 882 Milk… 1963 1963
10 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1964 1964
# … with 36,439 more rows, 4 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
# ¹`Domain Code`, ²`Country Code`, ³`Element Code`, ⁴`Item Code`,
# ⁵`Year Code`
Join Data
As the data is ready to join, we will use left_join to join the dataset using country code.
<- left_join(cattle_new, country_faostatcsv, by = "Country Code" )
final_data final_data
# A tibble: 257,061 × 20
Domain Cod…¹ Domain Count…² Area Eleme…³ Element Item …⁴ Item Year …⁵ Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
2 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
3 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
4 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
5 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
6 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
7 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
8 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
9 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
10 QL Lives… 2 Afgh… 5318 Milk A… 882 Milk… 1961 1961
# … with 257,051 more rows, 10 more variables: Unit <chr>, Value <dbl>,
# Flag <chr>, `Flag Description` <chr>, `Country Group Code` <dbl>,
# `Country Group` <chr>, Country <chr>, `M49 Code` <chr>, `ISO2 Code` <chr>,
# `ISO3 Code` <chr>, and abbreviated variable names ¹`Domain Code`,
# ²`Country Code`, ³`Element Code`, ⁴`Item Code`, ⁵`Year Code`
<- rename (final_data, "Country_Group"= "Country Group" ) final_data_new
<- final_data_new%>%
final_data_new3 filter(Year >= 1998) %>%
filter(`Country_Group`=="Central Asia") %>%
filter(`Unit` == "tonnes")
%>%
final_data_new3 ggplot(aes(x = Area, y = Value/100000, fill = Area)) +
geom_bar(stat = "identity") +
labs(x = "Countries", y = "Different Agricultural Aspects", title = "Agricultural information across the world",
fill = "Element")
ggplot(final_data_new3) +
geom_bar(aes(x = Area, fill= Element )) +
labs(x = "Countries", y = "Different Agricultural Aspects", title = "Agricultural information across the world",
fill = "Element")