Challenge 8

challenge_8
railroads
snl
faostat
debt
Joining Data
Author

Yoshita Varma Annam

Published

January 16, 2022

library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in multiple data sets, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. mutate variables as needed (including sanity checks)
  4. join two or more data sets and analyze some aspect of the joined data

(be sure to only include the category tags for the data you use!)

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • military marriages ⭐⭐
  • faostat ⭐⭐
  • railroads ⭐⭐⭐
  • fed_rate ⭐⭐⭐
  • debt ⭐⭐⭐
  • us_hh ⭐⭐⭐⭐
  • snl ⭐⭐⭐⭐⭐
country_faostatcsv <- read_csv("_data/FAOSTAT_country_groups.csv")
cattle_faostatcsv <- read_csv("_data/FAOSTAT_cattle_dairy.csv")

Briefly describe the data

country_faostatcsv
# A tibble: 1,943 × 7
   `Country Group Code` `Country Group` Countr…¹ Country M49 C…² ISO2 …³ ISO3 …⁴
                  <dbl> <chr>              <dbl> <chr>   <chr>   <chr>   <chr>  
 1                 5100 Africa                 4 Algeria 012     DZ      DZA    
 2                 5100 Africa                 7 Angola  024     AO      AGO    
 3                 5100 Africa                53 Benin   204     BJ      BEN    
 4                 5100 Africa                20 Botswa… 072     BW      BWA    
 5                 5100 Africa               233 Burkin… 854     BF      BFA    
 6                 5100 Africa                29 Burundi 108     BI      BDI    
 7                 5100 Africa                35 Cabo V… 132     CV      CPV    
 8                 5100 Africa                32 Camero… 120     CM      CMR    
 9                 5100 Africa                37 Centra… 140     CF      CAF    
10                 5100 Africa                39 Chad    148     TD      TCD    
# … with 1,933 more rows, and abbreviated variable names ¹​`Country Code`,
#   ²​`M49 Code`, ³​`ISO2 Code`, ⁴​`ISO3 Code`
summary(country_faostatcsv)
 Country Group Code Country Group       Country Code     Country         
 Min.   : 336       Length:1943        Min.   :  1.0   Length:1943       
 1st Qu.:5200       Class :character   1st Qu.: 69.0   Class :character  
 Median :5403       Mode  :character   Median :136.0   Mode  :character  
 Mean   :5502                          Mean   :142.2                     
 3rd Qu.:5848                          3rd Qu.:202.0                     
 Max.   :9011                          Max.   :622.0                     
   M49 Code          ISO2 Code          ISO3 Code        
 Length:1943        Length:1943        Length:1943       
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
cattle_faostatcsv
# A tibble: 36,449 × 14
   Domain Cod…¹ Domain Area …² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 2 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961
 3 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961
 4 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962
 5 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962
 6 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962
 7 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1963  1963
 8 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1963  1963
 9 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1963  1963
10 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1964  1964
# … with 36,439 more rows, 4 more variables: Unit <chr>, Value <dbl>,
#   Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
#   ¹​`Domain Code`, ²​`Area Code`, ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`
summary(cattle_faostatcsv)
 Domain Code           Domain            Area Code          Area          
 Length:36449       Length:36449       Min.   :   1.0   Length:36449      
 Class :character   Class :character   1st Qu.:  69.0   Class :character  
 Mode  :character   Mode  :character   Median : 141.0   Mode  :character  
                                       Mean   : 775.2                     
                                       3rd Qu.: 215.0                     
                                       Max.   :5504.0                     
                                                                          
  Element Code    Element            Item Code       Item          
 Min.   :5318   Length:36449       Min.   :882   Length:36449      
 1st Qu.:5318   Class :character   1st Qu.:882   Class :character  
 Median :5420   Mode  :character   Median :882   Mode  :character  
 Mean   :5416                      Mean   :882                     
 3rd Qu.:5510                      3rd Qu.:882                     
 Max.   :5510                      Max.   :882                     
                                                                   
   Year Code         Year          Unit               Value          
 Min.   :1961   Min.   :1961   Length:36449       Min.   :        7  
 1st Qu.:1976   1st Qu.:1976   Class :character   1st Qu.:     7849  
 Median :1991   Median :1991   Mode  :character   Median :    43266  
 Mean   :1990   Mean   :1990                      Mean   :  4410235  
 3rd Qu.:2005   3rd Qu.:2005                      3rd Qu.:   700000  
 Max.   :2018   Max.   :2018                      Max.   :683217055  
                                                  NA's   :74         
     Flag           Flag Description  
 Length:36449       Length:36449      
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      

I have chosen country faostat and cattle faostat data set. After reviewing the data set it explains that country faostat dataset is codebook for country. This dataset has all the information country level. After joining this dataset with cattle faostat dataset we can perform data analysis based on different groups.

In cattle dataset there is a lot of information about agriculture across the world. As you can see from the summary in element code there is information about milk, yield and production. Also, there is a common column in both the datasets country code and area code. We will be using this to join the data.

Tidy Data (as needed)

cattle_new <- rename (cattle_faostatcsv, "Country Code"= "Area Code" )

cattle_new
# A tibble: 36,449 × 14
   Domain Cod…¹ Domain Count…² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 2 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1961  1961
 3 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1961  1961
 4 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1962  1962
 5 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1962  1962
 6 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1962  1962
 7 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1963  1963
 8 QL           Lives…       2 Afgh…    5420 Yield       882 Milk…    1963  1963
 9 QL           Lives…       2 Afgh…    5510 Produc…     882 Milk…    1963  1963
10 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1964  1964
# … with 36,439 more rows, 4 more variables: Unit <chr>, Value <dbl>,
#   Flag <chr>, `Flag Description` <chr>, and abbreviated variable names
#   ¹​`Domain Code`, ²​`Country Code`, ³​`Element Code`, ⁴​`Item Code`,
#   ⁵​`Year Code`

Join Data

As the data is ready to join, we will use left_join to join the dataset using country code.

final_data <- left_join(cattle_new, country_faostatcsv, by = "Country Code" )
final_data
# A tibble: 257,061 × 20
   Domain Cod…¹ Domain Count…² Area  Eleme…³ Element Item …⁴ Item  Year …⁵  Year
   <chr>        <chr>    <dbl> <chr>   <dbl> <chr>     <dbl> <chr>   <dbl> <dbl>
 1 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 2 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 3 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 4 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 5 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 6 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 7 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 8 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
 9 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
10 QL           Lives…       2 Afgh…    5318 Milk A…     882 Milk…    1961  1961
# … with 257,051 more rows, 10 more variables: Unit <chr>, Value <dbl>,
#   Flag <chr>, `Flag Description` <chr>, `Country Group Code` <dbl>,
#   `Country Group` <chr>, Country <chr>, `M49 Code` <chr>, `ISO2 Code` <chr>,
#   `ISO3 Code` <chr>, and abbreviated variable names ¹​`Domain Code`,
#   ²​`Country Code`, ³​`Element Code`, ⁴​`Item Code`, ⁵​`Year Code`
final_data_new <- rename (final_data, "Country_Group"= "Country Group" )
final_data_new3 <- final_data_new%>%
  filter(Year >= 1998) %>%
  filter(`Country_Group`=="Central Asia") %>%
  filter(`Unit` == "tonnes") 
final_data_new3 %>%
  ggplot(aes(x = Area, y = Value/100000, fill = Area)) +
  geom_bar(stat = "identity") + 
  labs(x = "Countries", y = "Different Agricultural Aspects", title = "Agricultural information across the world",
  fill = "Element")

ggplot(final_data_new3) +
  geom_bar(aes(x = Area, fill= Element )) + 
  labs(x = "Countries", y = "Different Agricultural Aspects", title = "Agricultural information across the world",
  fill = "Element")