Challenge 2 - FAOStat Cattle & Dairy (updated)

challenge_2
faostat
Justine Shakespeare
Author

Justine Shakespeare

Published

March 7, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Loading and cleaning the data

First we’ll load the tidyverse package and read in the data. We’ll focus on the FAOStat data on cattle and dairy.

Code
FAOCD <- read_csv("_data/FAOSTAT_cattle_dairy.csv")

This data includes information about quantities related to cattle and dairy products organized by geographical region (including countries, regions, and continents) and year. A more detailed description of the variables in this dataset is provided below.

Let’s take a look at the dataset with the head() and dim() functions.

Code
dim(FAOCD)
[1] 36449    14
Code
head(FAOCD)
# A tibble: 6 × 14
  `Domain Code` Domain      `Area Code` Area  `Element Code` Element `Item Code`
  <chr>         <chr>             <dbl> <chr>          <dbl> <chr>         <dbl>
1 QL            Livestock …           2 Afgh…           5318 Milk A…         882
2 QL            Livestock …           2 Afgh…           5420 Yield           882
3 QL            Livestock …           2 Afgh…           5510 Produc…         882
4 QL            Livestock …           2 Afgh…           5318 Milk A…         882
5 QL            Livestock …           2 Afgh…           5420 Yield           882
6 QL            Livestock …           2 Afgh…           5510 Produc…         882
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
#   Value <dbl>, Flag <chr>, `Flag Description` <chr>

We have a dataset with over 36,000 observations (rows) and 14 variables (columns). It looks like some of these columns contain redundant information (such as many of the columns with “Code” in the name) and/or they have the same value for all observations (indicating that they were likely taken from a larger dataset). Let’s use the unique() function to confirm that some of these variables hold just one value, in which case we can remove them for our analysis and keep this information as metadata for this dataset.

Code
unique(FAOCD$Domain)
[1] "Livestock Primary"
Code
unique(FAOCD$Item)
[1] "Milk, whole fresh cow"

Both of these columns contain just one value for all observations, so we can remove them.

Let’s use the select() and contain() functions to create a new dataframe that removes all of the columns we have decided we don’t need: the columns with “Code” in the name and the columns with only one value throughout the dataset.

Code
FAOCD_subset <- FAOCD %>% 
  select(-contains("Code")) %>% 
  select(-contains("Domain")) %>% 
  select(-contains("Item"))
  
FAOCD_subset
# A tibble: 36,449 × 7
   Area        Element       Year Unit    Value Flag  `Flag Description`
   <chr>       <chr>        <dbl> <chr>   <dbl> <chr> <chr>             
 1 Afghanistan Milk Animals  1961 Head   700000 F     FAO estimate      
 2 Afghanistan Yield         1961 hg/An    5000 Fc    Calculated data   
 3 Afghanistan Production    1961 tonnes 350000 F     FAO estimate      
 4 Afghanistan Milk Animals  1962 Head   700000 F     FAO estimate      
 5 Afghanistan Yield         1962 hg/An    5000 Fc    Calculated data   
 6 Afghanistan Production    1962 tonnes 350000 F     FAO estimate      
 7 Afghanistan Milk Animals  1963 Head   780000 F     FAO estimate      
 8 Afghanistan Yield         1963 hg/An    5128 Fc    Calculated data   
 9 Afghanistan Production    1963 tonnes 400000 F     FAO estimate      
10 Afghanistan Milk Animals  1964 Head   780000 F     FAO estimate      
# ℹ 36,439 more rows

Data exploration and analysis

Let’s take a closer look at this data now that we have updated dataframe with the glimpse() function.

Code
glimpse(FAOCD_subset)
Rows: 36,449
Columns: 7
$ Area               <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afgha…
$ Element            <chr> "Milk Animals", "Yield", "Production", "Milk Animal…
$ Year               <dbl> 1961, 1961, 1961, 1962, 1962, 1962, 1963, 1963, 196…
$ Unit               <chr> "Head", "hg/An", "tonnes", "Head", "hg/An", "tonnes…
$ Value              <dbl> 700000, 5000, 350000, 700000, 5000, 350000, 780000,…
$ Flag               <chr> "F", "Fc", "F", "F", "Fc", "F", "F", "Fc", "F", "F"…
$ `Flag Description` <chr> "FAO estimate", "Calculated data", "FAO estimate", …

It looks as though this data includes information about the Value of a variety of cattle and dairy related products (in the column Element) organized by geographical region (in the column Area) and Year.

We have two double (numeric) variables, including Value and Year. The former variable looks to be related to the variable Unit, which is a character variable. Let’s use the range() function to take a closer look at the Year variable and see what time period this data covers.

Code
range(FAOCD_subset$Year)
[1] 1961 2018

This data dates back to 1961 and goes up to 2018.

This dataframe also includes five character variables: The Area column includes primarily countries and some larger regions and groupings of countries. The Element variable includes three quantities relates to cattle and dairy products: Milk Animals, Yield, and Production. The Unit column, as mentioned above, looks to describe the unit of the values in the Value column. Finally, Flag and Flag Description seem to include information about the data source for each observation in this dataframe. Since we’re interested in comparing across countries and regions, let’s use the unique() function to see what is in the Area column.

Code
unique(FAOCD_subset$Area)
  [1] "Afghanistan"                                         
  [2] "Albania"                                             
  [3] "Algeria"                                             
  [4] "American Samoa"                                      
  [5] "Angola"                                              
  [6] "Antigua and Barbuda"                                 
  [7] "Argentina"                                           
  [8] "Armenia"                                             
  [9] "Australia"                                           
 [10] "Austria"                                             
 [11] "Azerbaijan"                                          
 [12] "Bahamas"                                             
 [13] "Bahrain"                                             
 [14] "Bangladesh"                                          
 [15] "Barbados"                                            
 [16] "Belarus"                                             
 [17] "Belgium"                                             
 [18] "Belgium-Luxembourg"                                  
 [19] "Belize"                                              
 [20] "Benin"                                               
 [21] "Bermuda"                                             
 [22] "Bhutan"                                              
 [23] "Bolivia (Plurinational State of)"                    
 [24] "Bosnia and Herzegovina"                              
 [25] "Botswana"                                            
 [26] "Brazil"                                              
 [27] "British Virgin Islands"                              
 [28] "Brunei Darussalam"                                   
 [29] "Bulgaria"                                            
 [30] "Burkina Faso"                                        
 [31] "Burundi"                                             
 [32] "Cabo Verde"                                          
 [33] "Cambodia"                                            
 [34] "Cameroon"                                            
 [35] "Canada"                                              
 [36] "Central African Republic"                            
 [37] "Chad"                                                
 [38] "Chile"                                               
 [39] "China, Hong Kong SAR"                                
 [40] "China, mainland"                                     
 [41] "China, Taiwan Province of"                           
 [42] "Colombia"                                            
 [43] "Comoros"                                             
 [44] "Congo"                                               
 [45] "Costa Rica"                                          
 [46] "Côte d'Ivoire"                                       
 [47] "Croatia"                                             
 [48] "Cuba"                                                
 [49] "Cyprus"                                              
 [50] "Czechia"                                             
 [51] "Czechoslovakia"                                      
 [52] "Democratic People's Republic of Korea"               
 [53] "Democratic Republic of the Congo"                    
 [54] "Denmark"                                             
 [55] "Djibouti"                                            
 [56] "Dominica"                                            
 [57] "Dominican Republic"                                  
 [58] "Ecuador"                                             
 [59] "Egypt"                                               
 [60] "El Salvador"                                         
 [61] "Eritrea"                                             
 [62] "Estonia"                                             
 [63] "Eswatini"                                            
 [64] "Ethiopia"                                            
 [65] "Ethiopia PDR"                                        
 [66] "Falkland Islands (Malvinas)"                         
 [67] "Fiji"                                                
 [68] "Finland"                                             
 [69] "France"                                              
 [70] "French Guyana"                                       
 [71] "French Polynesia"                                    
 [72] "Gabon"                                               
 [73] "Gambia"                                              
 [74] "Georgia"                                             
 [75] "Germany"                                             
 [76] "Ghana"                                               
 [77] "Greece"                                              
 [78] "Grenada"                                             
 [79] "Guadeloupe"                                          
 [80] "Guatemala"                                           
 [81] "Guinea"                                              
 [82] "Guinea-Bissau"                                       
 [83] "Guyana"                                              
 [84] "Haiti"                                               
 [85] "Honduras"                                            
 [86] "Hungary"                                             
 [87] "Iceland"                                             
 [88] "India"                                               
 [89] "Indonesia"                                           
 [90] "Iran (Islamic Republic of)"                          
 [91] "Iraq"                                                
 [92] "Ireland"                                             
 [93] "Israel"                                              
 [94] "Italy"                                               
 [95] "Jamaica"                                             
 [96] "Japan"                                               
 [97] "Jordan"                                              
 [98] "Kazakhstan"                                          
 [99] "Kenya"                                               
[100] "Kuwait"                                              
[101] "Kyrgyzstan"                                          
[102] "Lao People's Democratic Republic"                    
[103] "Latvia"                                              
[104] "Lebanon"                                             
[105] "Lesotho"                                             
[106] "Liberia"                                             
[107] "Libya"                                               
[108] "Liechtenstein"                                       
[109] "Lithuania"                                           
[110] "Luxembourg"                                          
[111] "Madagascar"                                          
[112] "Malawi"                                              
[113] "Malaysia"                                            
[114] "Mali"                                                
[115] "Malta"                                               
[116] "Martinique"                                          
[117] "Mauritania"                                          
[118] "Mauritius"                                           
[119] "Mexico"                                              
[120] "Mongolia"                                            
[121] "Montenegro"                                          
[122] "Montserrat"                                          
[123] "Morocco"                                             
[124] "Mozambique"                                          
[125] "Myanmar"                                             
[126] "Namibia"                                             
[127] "Nepal"                                               
[128] "Netherlands"                                         
[129] "Netherlands Antilles (former)"                       
[130] "New Caledonia"                                       
[131] "New Zealand"                                         
[132] "Nicaragua"                                           
[133] "Niger"                                               
[134] "Nigeria"                                             
[135] "Niue"                                                
[136] "North Macedonia"                                     
[137] "Norway"                                              
[138] "Oman"                                                
[139] "Pakistan"                                            
[140] "Palestine"                                           
[141] "Panama"                                              
[142] "Papua New Guinea"                                    
[143] "Paraguay"                                            
[144] "Peru"                                                
[145] "Philippines"                                         
[146] "Poland"                                              
[147] "Portugal"                                            
[148] "Puerto Rico"                                         
[149] "Qatar"                                               
[150] "Republic of Korea"                                   
[151] "Republic of Moldova"                                 
[152] "Réunion"                                             
[153] "Romania"                                             
[154] "Russian Federation"                                  
[155] "Rwanda"                                              
[156] "Saint Lucia"                                         
[157] "Saint Vincent and the Grenadines"                    
[158] "Samoa"                                               
[159] "Sao Tome and Principe"                               
[160] "Saudi Arabia"                                        
[161] "Senegal"                                             
[162] "Serbia"                                              
[163] "Serbia and Montenegro"                               
[164] "Seychelles"                                          
[165] "Sierra Leone"                                        
[166] "Slovakia"                                            
[167] "Slovenia"                                            
[168] "Solomon Islands"                                     
[169] "Somalia"                                             
[170] "South Africa"                                        
[171] "South Sudan"                                         
[172] "Spain"                                               
[173] "Sri Lanka"                                           
[174] "Sudan"                                               
[175] "Sudan (former)"                                      
[176] "Suriname"                                            
[177] "Sweden"                                              
[178] "Switzerland"                                         
[179] "Syrian Arab Republic"                                
[180] "Tajikistan"                                          
[181] "Thailand"                                            
[182] "Togo"                                                
[183] "Tonga"                                               
[184] "Trinidad and Tobago"                                 
[185] "Tunisia"                                             
[186] "Turkey"                                              
[187] "Turkmenistan"                                        
[188] "Uganda"                                              
[189] "Ukraine"                                             
[190] "United Arab Emirates"                                
[191] "United Kingdom of Great Britain and Northern Ireland"
[192] "United Republic of Tanzania"                         
[193] "United States of America"                            
[194] "United States Virgin Islands"                        
[195] "Uruguay"                                             
[196] "USSR"                                                
[197] "Uzbekistan"                                          
[198] "Vanuatu"                                             
[199] "Venezuela (Bolivarian Republic of)"                  
[200] "Viet Nam"                                            
[201] "Wallis and Futuna Islands"                           
[202] "Yemen"                                               
[203] "Yugoslav SFR"                                        
[204] "Zambia"                                              
[205] "Zimbabwe"                                            
[206] "World"                                               
[207] "Africa"                                              
[208] "Eastern Africa"                                      
[209] "Middle Africa"                                       
[210] "Northern Africa"                                     
[211] "Southern Africa"                                     
[212] "Western Africa"                                      
[213] "Americas"                                            
[214] "Northern America"                                    
[215] "Central America"                                     
[216] "Caribbean"                                           
[217] "South America"                                       
[218] "Asia"                                                
[219] "Central Asia"                                        
[220] "Eastern Asia"                                        
[221] "Southern Asia"                                       
[222] "South-eastern Asia"                                  
[223] "Western Asia"                                        
[224] "Europe"                                              
[225] "Eastern Europe"                                      
[226] "Northern Europe"                                     
[227] "Southern Europe"                                     
[228] "Western Europe"                                      
[229] "Oceania"                                             
[230] "Australia and New Zealand"                           
[231] "Melanesia"                                           
[232] "Polynesia"                                           

Without going through this list exhaustively, it looks as if this variable contains all countries in the world, some historic geopolitical areas (such as “Sudan (former)” or “USSR”) and larger regions of the world (such as “World”, “Africa”, and subsets of continents, such as “Caribbean”, “Central Asia”, etc.). If this were a real analysis I would spend the time removing values that were not countries and ensuring that this list was a complete and current list of countries.

Let’s use the group_by() and summarize() functions to find the mean, median, and standard deviation of each Element in this dataset.

Code
FAOCD_grouped <- FAOCD_subset %>% 
  group_by(Element) %>% 
  summarize("mean" = mean(Value, na.rm = TRUE), 
            "median" = median(Value, na.rm = TRUE),
            "standard_deviation" = sd(Value, na.rm = TRUE))

FAOCD_grouped
# A tibble: 3 × 4
  Element          mean median standard_deviation
  <chr>           <dbl>  <dbl>              <dbl>
1 Milk Animals 4205410. 295000          18041595.
2 Production   9001419. 295500          40268994.
3 Yield          19329.  13218             19361.

This gives us a sense of the average quantity of these three elements, but it covers all years in the dataset. To get a sense of how these quantities have changed over time, let’s get averages for a few specific years:

Code
FAOCD_Comparison <- FAOCD_subset %>% 
  filter(Year %in% c(1980, 1990, 2000, 2010)) %>% 
  group_by(Year, Element) %>% 
  summarize("mean" = mean(Value, na.rm = TRUE), 
            "median" = median(Value, na.rm = TRUE),
            "standard_deviation" = sd(Value, na.rm = TRUE)) %>% 
  arrange(Element, desc(Year))


FAOCD_Comparison
# A tibble: 12 × 5
# Groups:   Year [4]
    Year Element           mean  median standard_deviation
   <dbl> <chr>            <dbl>   <dbl>              <dbl>
 1  2010 Milk Animals  4709434. 344464.          20534043.
 2  2000 Milk Animals  3954344. 330000           16993966.
 3  1990 Milk Animals  4506236. 292740.          18693594.
 4  1980 Milk Animals  4247715. 257500           17844403.
 5  2010 Production   10795844. 529600           46966923.
 6  2000 Production    8826568. 457892.          38833266.
 7  1990 Production    9667473. 238522.          42735363.
 8  1980 Production    8532347. 163986.          38216228.
 9  2010 Yield           25900.  14582.             24618.
10  2000 Yield           22370.  13875              20921.
11  1990 Yield           17708.  10166.             17587.
12  1980 Yield           15337.  10000              14176.

This data looks at the average and median quantities (along with the standard deviation) for each element in 4 different years, each a decade apart, from the dataset. Since we have it arranged by Element and Year (in descending order) we can see for each Element how the average quantity has changed in the past few decades.

Let’s take a closer look at a particular element here. I’d like to see where the highest average Production is happening. We’ll use the filter(), group_by(), summarize() and arrange() functions.

Code
FAOCD_Production <- FAOCD_subset %>% 
  filter(Element == "Production") %>% 
  group_by(Area) %>% 
  summarize("mean" = mean(Value, na.rm = TRUE), 
            "median" = median(Value, na.rm = TRUE),
            "standard_deviation" = sd(Value, na.rm = TRUE)) %>% 
  arrange(desc(`mean`))

FAOCD_Production
# A tibble: 232 × 4
   Area                           mean     median standard_deviation
   <chr>                         <dbl>      <dbl>              <dbl>
 1 World                    470750957. 464550446.         102711714.
 2 Europe                   228904979. 220194403           26887876.
 3 Americas                 123817831. 115606130.          33846762.
 4 Eastern Europe            97495137.  92226154           23467137.
 5 USSR                      87525893.  90300000           13611099.
 6 Asia                      80628670.  55479513           59049221.
 7 Northern America          77052008.  74200213           14132238.
 8 Western Europe            75105368.  73757169            6644426.
 9 United States of America  69087431.  66390126.          14054757.
10 Southern Asia             39995745.  29330082.          29257149.
# ℹ 222 more rows

Unsurprisingly, the value that tops this list is “World” and larger geographical regions. Since this is an aggregate of all of the countries specified here, it makes sense that this would have the highest average production. This data shows that Europe has the highest average production, followed by the Americas and Eastern Europe. The first country that shows up on this list is the USSR, which is no longer a country. But given this, it makes sense that the next country that shows up on the list is Russia.

Let’s take a look at the same information but filtered to look at just data from recent years (since 2010).

Code
FAOCD_Since2010 <- FAOCD_subset %>% 
  filter(Year > 2010) %>% 
  filter(Element == "Production") %>% 
  group_by(Area) %>% 
  summarize("mean" = mean(Value, na.rm = TRUE), 
            "median" = median(Value, na.rm = TRUE),
            "standard_deviation" = sd(Value, na.rm = TRUE)) %>% 
  arrange(desc(`mean`))

FAOCD_Since2010
# A tibble: 226 × 4
   Area                           mean     median standard_deviation
   <chr>                         <dbl>      <dbl>              <dbl>
 1 World                    653120252. 658338067           23726344.
 2 Europe                   215588432. 217574232.           5307525.
 3 Asia                     189924225. 189471380           14421624.
 4 Americas                 181531377  182933188.           3334320.
 5 Northern America         102217738. 102313027            3168627.
 6 Southern Asia             96623566.  95129610           13094625.
 7 United States of America  94026020.  94041764.           3455285.
 8 Western Europe            80955426.  81439856            3030278.
 9 Eastern Europe            72396074   72426114.            545233.
10 India                     71425625   70034420           11766081.
# ℹ 216 more rows

We can see here that since 2010 Europe is still the continent with the highest average production value in recent years, but Asia has moved up the list to second. India is the first country to show up on the list and in fact has a higher production average than some larger regions, including South America and Eastern Asia.

Now let’s look at this data from the first 20 years or so of the dataset (before 1980) to get a sense of how things have changed.

Code
FAOCD_before1980 <- FAOCD_subset %>% 
  filter(Year < 1980) %>% 
  filter(Element == "Production") %>% 
  group_by(Area) %>% 
  summarize("mean" = mean(Value, na.rm = TRUE), 
            "median" = median(Value, na.rm = TRUE),
            "standard_deviation" = sd(Value, na.rm = TRUE)) %>% 
  arrange(desc(`mean`))

FAOCD_before1980
# A tibble: 198 × 4
   Area                           mean    median standard_deviation
   <chr>                         <dbl>     <dbl>              <dbl>
 1 World                    361627060. 359280844          33639057.
 2 Europe                   224168181. 223718733          24023230.
 3 Eastern Europe           106068121. 107828820          14815809.
 4 Americas                  88434904.  86307319           4274964.
 5 USSR                      80172213.  82400000          11510128.
 6 Western Europe            69489343.  69892996           5758308.
 7 Northern America          62709675.  62469883           1896295.
 8 United States of America  54669493.  54442144           1777245.
 9 Northern Europe           30329690.  29435040           1860393.
10 Germany                   27842541.  28060608           1718558.
# ℹ 188 more rows

This table shows data from 1961 to 1979, more than 30 years earlier than the data from the previous table we looked at (data since 2010). It is interesting to note that Europe still tops the list, but the first countries that show up on this list are the USSR (which no longer exists) and Germany.

Conclusion and further research

As noted earlier in this post, with more time I would more thoroughly clean the Area variable, so that we could more easily create tables focused on just countries, geographical regions, or continents.

There are so many different ways you could explore this data! Another area that would be worth exploring is the change over time of production and how this varies across countries and regions. Visualization could be used to better illustrate any patterns and trends discovered in the data.