Understanding the FAOSTAT Country Codes

challenge_2
Sue-Ellen Duffy
FAOSTAT
Author

Sue-Ellen Duffy

Published

February 24, 2023

Code
library(tidyverse)
library(summarytools)
Error in library(summarytools): there is no package called 'summarytools'
Code
library(readxl)
knitr::opts_chunk$set(echo = TRUE)

FAOSTAT data

This data set is simply the Country Profiles for the Food and Agriculture Organization Corporate Statistical Database (FAOSTAT).

In this set of data, each column is describing the same data, but vary in who is describing or using that data. The United Nations Terminology Database, the Statistics Division of the United Nations Secretariat, and the International Organization for Standardization each have different ways of coding the same countries, so this database helps us understand which country or region or group of countries is being described in other FAO data sets.

Code
#Read in data and rename FAOSTAT_country_groups as groups
data <- read_csv("_data/FAOSTAT_country_groups.csv")
Rows: 1943 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Country Group, Country, M49 Code, ISO2 Code, ISO3 Code
dbl (2): Country Group Code, Country Code

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
data
# A tibble: 1,943 × 7
   `Country Group Code` `Country Group` `Country Code` Country        `M49 Code`
                  <dbl> <chr>                    <dbl> <chr>          <chr>     
 1                 5100 Africa                       4 Algeria        012       
 2                 5100 Africa                       7 Angola         024       
 3                 5100 Africa                      53 Benin          204       
 4                 5100 Africa                      20 Botswana       072       
 5                 5100 Africa                     233 Burkina Faso   854       
 6                 5100 Africa                      29 Burundi        108       
 7                 5100 Africa                      35 Cabo Verde     132       
 8                 5100 Africa                      32 Cameroon       120       
 9                 5100 Africa                      37 Central Afric… 140       
10                 5100 Africa                      39 Chad           148       
# ℹ 1,933 more rows
# ℹ 2 more variables: `ISO2 Code` <chr>, `ISO3 Code` <chr>

Summarize the data

Looking through this quick summary we see that there are 277 Countries in this data, that IS02 is missing data, that the M49 Code column is characterized as being characters (though they are all numeric) and while there are other data here, there are two charts of interests - Country Group and Country.

IS02 is “missing data” because of the code NA which is their 2-alpha code for Nambia.

M49 Code column is characterized as being characters even though they are all numeric.

The Country Group tibble shows us the 10 Country Groups containing the most Countries.

The Country tibble shows us the 10 Countries that were categorized into different Country Groups the most.

Code
dfSummary(data)
Error in dfSummary(data): could not find function "dfSummary"

How are the codes different?

Take the United States of America for example. When filtering for “United States of America” we come out with 8 different rows of data. The Country Code, M49 Code, IS02 Code and IS03 Codes while unique to their specifics are unchanged for the 8 rows. The difference here, and why we get 8 different rows for the United States of America, is that their Country Group Code and Country Group are different. The Country Group Code is simply the number associated with the Country Group. It appears that Country Group is a categorical code, listing the USA as being part of the Americas, High-income economies, North and Central America, Annex I countries, etc.

The Country Group would allow for quick and categorical data analysis, such as analyzing the countries by economics (high-income economies and low-income economies) or by region (Northern and Central America to the Americas.

Code
USA <- filter(data, `Country` == "United States of America")
USA
# A tibble: 8 × 7
  `Country Group Code` `Country Group`         `Country Code` Country `M49 Code`
                 <dbl> <chr>                            <dbl> <chr>   <chr>     
1                 5200 Americas                           231 United… 840       
2                 5848 Annex I countries                  231 United… 840       
3                 9010 High-income economies              231 United… 840       
4                  336 North and Central Amer…            231 United… 840       
5                 5203 Northern America                   231 United… 840       
6                 5208 Northern America and E…            231 United… 840       
7                 5873 OECD                               231 United… 840       
8                 5000 World                              231 United… 840       
# ℹ 2 more variables: `ISO2 Code` <chr>, `ISO3 Code` <chr>

The “World”

The one Country Group that contains all of the countries is “World”, which consists of 277 Countries. If the goal was to analyze all the countries at once, the filter should be set to “World”.

Code
world <- filter(data, `Country Group` == "World")
world
# A tibble: 277 × 7
   `Country Group Code` `Country Group` `Country Code` Country        `M49 Code`
                  <dbl> <chr>                    <dbl> <chr>          <chr>     
 1                 5000 World                        2 Afghanistan    004       
 2                 5000 World                      284 Åland Islands  248       
 3                 5000 World                        3 Albania        008       
 4                 5000 World                        4 Algeria        012       
 5                 5000 World                        5 American Samoa 016       
 6                 5000 World                        6 Andorra        020       
 7                 5000 World                        7 Angola         024       
 8                 5000 World                      258 Anguilla       660       
 9                 5000 World                       30 Antarctica     010       
10                 5000 World                        8 Antigua and B… 028       
# ℹ 267 more rows
# ℹ 2 more variables: `ISO2 Code` <chr>, `ISO3 Code` <chr>

Anything Else Interesting?

The M49 Codes with the suffix “.01” are characterized as being “unspecified (population)”. I’m not entirely sure what that means, so it could be interesting to understand this further. Here is one example:

Code
data %>%
  filter(`M49 Code` == "155.01") %>%
select("Country Group", "Country", "M49 Code")
# A tibble: 3 × 3
  `Country Group` Country                                  `M49 Code`
  <chr>           <chr>                                    <chr>     
1 Europe          Western Europe, unspecified (population) 155.01    
2 Western Europe  Western Europe, unspecified (population) 155.01    
3 World           Western Europe, unspecified (population) 155.01