DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Final Project

  • Final materials
    • Fall 2022 posts
    • final Posts

On this page

  • Introduction
  • Import and transform into tidy data
    • Read the primary dataset
      • The variables in the primary dataset
    • Initial analysis
    • Merging Commodity Import & Export Price Index obtained by web scraping with the main dataset
    • Creating new variables
  • Exploratory Data Analysis
    • Summary statistics of the columns
    • Univariate analysis
      • Year
      • Country
      • Exchange Rate
      • Currency Crises
      • GDP weighted default
      • Independence
      • Inflation Rate
    • Bivariate and multivarite analysis
      • Economic crises
      • Domestic & Sovereign external debt in defaults
      • Export & Import Price Index and Inflation Rate
      • Correlation Between Economic Indicators
  • Conclusion and Considerations for Further Analysis
    • References:

Final Project

  • Show All Code
  • Hide All Code

  • View Source
An Explanation to African Systemic Crises
Author

Said Arslan

Published

December 23, 2022

Code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.5.0 
✔ readr   2.1.3      ✔ forcats 0.5.2 
Warning: package 'ggplot2' was built under R version 4.2.2
Warning: package 'tibble' was built under R version 4.2.2
Warning: package 'stringr' was built under R version 4.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(countrycode)
Warning: package 'countrycode' was built under R version 4.2.2
Code
library(summarytools)
Warning: package 'summarytools' was built under R version 4.2.2

Attaching package: 'summarytools'

The following object is masked from 'package:tibble':

    view
Code
library(htmlTable)
library(lessR)
Warning: package 'lessR' was built under R version 4.2.2

lessR 4.2.4                         feedback: gerbing@pdx.edu 
--------------------------------------------------------------
> d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
  d is default data frame, data= in analysis routines optional

Learn about reading, writing, and manipulating data, graphics,
testing means and proportions, regression, factor analysis,
customization, and descriptive statistics from pivot tables.
  Enter:  browseVignettes("lessR")

View changes in this and recent versions of lessR.
  Enter: news(package="lessR")

**New Feature**: Interactive analysis of your data
  Enter: interact()


Attaching package: 'lessR'

The following object is masked from 'package:summarytools':

    label

The following objects are masked from 'package:dplyr':

    recode, rename
Code
library(RColorBrewer)
library(corrplot)
corrplot 0.92 loaded
Code
load("Final_project- Said Arslan.RData")

knitr::opts_chunk$set(echo = TRUE, warning=FALSE)

Introduction

The primary data I used for this project is extracted from Kaggle: https://www.kaggle.com/chirin/africa-economic-banking-and-systemic-crisis-data. The dataset is a derivative of Reinhart et al’s Global Crises dataset. It was downloaded from: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx Non-African countries were filtered out from the dataset. And, variables containing notes or sparse data were removed. Years with no data were also filtered out.

The dataset specifically focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe. Each country-year observation (row) includes macroeconomic indicators of the country. I also web scraped additional import and export price index data for these countries, from Commodity of Terms of Trade Database of IMF: https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851, which I think will play essential role in my analysis.

The main purpose of the project is to understand what patterns and trends are associated with mainly systemic crises and other type of economic crises as well as how they differ by country.

Import and transform into tidy data

Read the primary dataset

Code
# ac <-
#    read_csv("C:/Users/saida/Documents/african_crises.csv",
#     col_types = cols(
#         case = col_integer(),
#         cc3 = col_factor(ordered = TRUE),
#         country = col_character(),
#         year = col_date(format = "%Y"),
#         exch_usd = col_double(),
#         gdp_weighted_default = col_double(),
#         inflation_annual_cpi = col_double(),
#         systemic_crisis = col_integer(),
#         domestic_debt_in_default = col_integer(),
#         sovereign_external_debt_default = col_integer(),
#         independence = col_integer(),
#         currency_crises = col_integer(),
#         inflation_crises = col_integer(),
#         banking_crisis = col_character()
#     )
#   )
# 
# sample_n(ac, 10) %>% htmlTable()

In the main dataset, each variable has its own column, each observation has its own row, and each value has its own cell.

The variables in the primary dataset

  1. case is a factor variable that identifies the index for each country from the original, global dataset.

  2. cc3 is a factor variable that identifies each country with a three letter code.

  3. country is a character variable that names each country.

  4. year is a date variable that identifies the year of the observation.

  5. systemic_crisis is a binary variable that identifies whether a systemic crisis is observed in that year or not. Systemic crisis means the nationwide banking crisis which its impact spreads to the other sectors.

  6. exch_usd is a numeric variable that gives exchange rate of USD to the country’s currency. For a country, the lower the exchange rate the more valuable its currency.

  7. domestic_debt_in_default is a binary variable that shows if the government failed in repaying its domestic debts or not.

  8. sovereign_external_debt_default is a binary variable that identifies if sovereign external debt default occurred or not. It is a failure of a government to honor some or all of its debt obligations to the external lenders.

  9. gdp_weighted_default is a numeric variable which gives the ratio of debt to gdp when debt default is observed.

  10. inflation_annual_cpi is numeric variable that shows inflation rate based on consumer price index.

  11. independence is a binary variable that indicates if the country is independent or not as of observation year.

  12. currency_crises is a binary variable that identifies currency crisis occured or not in the observation year.

  13. inflation_crises is a binary variable that identifies inflation crisis occured or not in the observation year.

  14. banking_crisis is a binary variable that identifies banking crisis occured or not in the observation year.

For consistency I will first change column names with “crises” to “crisis” as well as “inflation_annual_cpi” to “inflation_rate. Second, I will remove case number that is not needed in my analysis. Then, I will recode”banking_crisis” variable which is differently coded in the dataset than the other crisis indicator variables.

Code
cat(unique(ac$banking_crisis), sep = "\n")
1
0
Code
# ac <- ac %>% dplyr::rename("inflation_crisis" = "inflation_crises", 
#                            "currency_crisis" = "currency_crises",
#                            "inflation_rate" = "inflation_annual_cpi") %>% 
#   select(-case) %>% 
#   mutate(banking_crisis = case_when(
#          banking_crisis == "crisis" ~ 1L,
#          banking_crisis == "no_crisis" ~ 0L)
#   )

Initial analysis

I would like to supplement different economic indicators to the original dataset to better analyze the economic crises, however, there is no financial and economic data available for African countries that goes back to the 1800s or early 1900s.

That’s why, I wanted to first examine timeline of crisis in these African countries along with dates of their gaining independency.

Code
#I'll create a timeline that shows independence of each country and include point for years of crises in the plot

filt_crs <- ac %>% 
  filter(if_any(ends_with("crisis"), ~ .x == 1)) %>%
  pivot_longer(cols = ends_with("crisis"), 
               names_to = "type_of_crisis",
               values_to = "crisis_or_not") %>% 
  filter(crisis_or_not == 1)


ac %>% ggplot(aes(year, independence)) + 
  geom_line() +
  geom_area(fill = "#FFDB6D", alpha = 0.3) +
  geom_point(data = filt_crs, 
             mapping = aes(year, crisis_or_not, color=type_of_crisis)) +
  guides(color=guide_legend("Types of Crisis", override.aes = list(size = 3))) +
  scale_y_continuous(name = "Independence", breaks = c(0,1)) +
  scale_x_date(name = "Year", 
               date_breaks = "20 years", 
               date_labels = "%Y", 
               limits = c(as.Date("1860-01-01"), NA), 
               expand=c(0,0)) +
  facet_wrap(vars(country), ncol = 2) +
  theme_light()

The plots above reveal crucial situation for my further analysis. As we can see, most of the economic crisis in African countries actually took place after they gained independence and most of the countries got their independency around 1960s. Especially, all the systemic crises are after1960s. Therefore, I decided to focus on the data after 60s in analysis. Also, it appears that including additional economic indicators, which are usually start from 60s, to the main dataset would comply with this perspective

Merging Commodity Import & Export Price Index obtained by web scraping with the main dataset

In order to expand the features of the main dataset, I scrape additional data provided by the Commodity of Terms of Trade Database of International Monetary Fund (IMF). While the IMF provides many formats including excel and csv file formats but I instead wanted to invest time on scraping the rendered html using their online table viewer: https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851.

Using the viewer I selected the desired data and used inspect in Chrome to view the rendered html for the current page. Then copied the div representing the entire table in html and pasted the text into local files.

First I wrote a function to convert the html, into a data frame matching the online table viewer.

Code
imf_html_to_df <-  function (html_filename){

  # getting column labels by looking at the custom filename
  filename_pieces = str_split(html_filename, "_")[[1]]

  # read all the html in the file
  html_str = readLines(html_filename)

  # function returning the elements matched by the regex
  html_to_byRowList = function(full_file_str){
    regex_exp = "<div class=\"PPTSCellConText.*?>(.*?)<"
    result = stringr::str_match_all(full_file_str, regex_exp)
    return(result[[1]][, 2])
  }

  # all elements in the df by row
  elements = html_to_byRowList(html_str)

  columns = c("country", seq(as.integer(filename_pieces[3]), as.integer(filename_pieces[4]), by=1))
  num_columns = length(columns)
  schema = matrix(ncol = num_columns, nrow = 0)
  df = data.frame(schema)
  colnames(df) = columns

  # grab elements, ignoring column headers
  elements = elements[num_columns:length(elements)]

  # populate df iterating through element by df row
  start = 1
  while(start + num_columns - 1 <= length(elements)){
    curr = elements[start:(start+num_columns-1)]
    curr = do.call(c, list(list(curr[1]), as.double(curr[2:(length(curr))])))
    df[nrow(df) + 1,] = curr
    start = start + num_columns
  }

  # add cc3 labels used in joining later
  df$cc3 = as_factor(countrycode(df$country, origin = 'country.name', destination = 'iso3c'))
  return(df)
}

Applying the transformer function on html export price index files. They are in 20 year grouping since at most 20 columns render.

Joining them together.

Code
export_df <-  merge(x=export_df_1, y=export_df_2, by=c("cc3", "country")) %>% 
  as_tibble()
htmlTable(export_df)
cc3 country 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
1 AGO Angola 40.83 40.36 40.41 40.57 40.36 39.65 40.86 40.79 40.92 61.84 58.33 60.86 62.37 59.86 78.01 81.05 77.91 76.25 73.29 71.88 71.08 52.01 55.29 48.22 51.89 57.39 52.33 52.04 49.36 48.07 48.8 52.2 51.85 44.63 51.52 62.81 59.69 60.81 64.12 70.06 80.31 86.37 89.05 99.55 82.15 90.83 100.7 100 100.87 97.7
2 CAF Central African Republic 99.9 99.88 99.96 100.01 99.71 100.03 99.6 99.4 100.37 100.08 99.24 100.93 101.43 100.54 101.13 100.69 99.51 99.64 99.68 100.08 99.3 99.45 99.65 99.15 99.01 98.67 98.51 98.22 99.45 100.46 100.2 99.5 99.78 98.94 98.79 98.48 97.65 97.79 98.33 98.29 98.61 99.03 99.36 99.6 99.26 99.88 100.99 100 99.65 99.79
3 CIV Côte d'Ivoire 92.84 94.73 95.44 96.71 98.11 95.79 92.49 93.04 98.01 102.28 97.72 104.17 110.29 106.54 108.21 104.6 100.47 99.06 100.58 101.51 100.17 96.25 95.4 91.42 89.6 89.87 88.03 87.2 87.5 90.65 90.77 90.29 91.5 89.98 87.91 87.44 87.49 91.67 92.44 92.11 93.66 95.28 97.34 100.54 99.01 101.95 103.6 100 100.09 101.64
4 DZA Algeria 59.16 58.8 58.99 59.06 58.87 58.13 58.65 58.55 58.94 73.33 71.72 73.73 76.05 75.29 85.38 87.85 87.27 87.6 86.12 85.24 84.64 72.86 72.43 67.65 69.39 72.97 69.81 69.07 68.21 66.69 66.99 70.31 69.96 64.81 69.02 79.31 77.66 76.94 81.18 84.43 92.25 94.6 95.7 102.89 90.26 94.47 100.26 100 100.79 99.54
5 EGY Egypt 94 93.9 94.01 94.04 93.97 93.8 93.9 93.87 94.12 96.92 96.34 96.73 97.02 96.77 98.5 98.79 98.43 98.18 98.05 97.87 97.61 95.37 95.83 94.96 95.4 96.05 95.29 95.1 94.74 94.72 94.93 95.31 95.24 94.19 94.97 96.5 96.11 96.17 96.73 97.34 98.36 98.98 99.21 100.11 98.43 99.43 100.37 100 100.06 99.81
6 KEN Kenya 100.22 100.12 100.01 99.71 99.57 99.79 99.41 99.39 99.36 100.01 99.56 100.36 101.6 100.63 100.85 100.43 99.81 99.82 100.11 100.75 99.84 99.34 98.7 98.38 98.39 98.31 97.82 97.8 97.72 98.28 98.09 97.98 98.65 98.39 98.37 98.53 97.9 97.87 98.13 98.24 98.78 99.18 99.11 99.63 99.39 99.8 100.28 100 99.56 99.51
7 MAR Morocco 100.54 100.44 100.51 100.57 100.86 100.78 100.35 100.67 101.5 101.16 100.19 100.81 101.3 101 101.24 100.72 100.51 100.65 100.48 100.45 100.35 99.87 100.1 99.92 99.6 99.66 99.15 99.22 98.96 99.06 99.11 98.96 98.85 98.88 98.81 99 98.82 98.87 98.9 98.96 99.46 100.23 100.17 99.94 99.51 100.27 100.52 100 100.46 100.61
8 MUS Mauritius 97.15 95.56 95.72 95.51 100.29 100.92 101 105.49 107.96 116.83 110.86 105.49 101.99 100.17 100.79 109.86 103.87 97.64 97.6 93.12 91.15 94.14 94.32 96.74 98.14 97.87 94.32 94.63 95.26 96.82 97.15 95.86 95.74 94.01 91.03 93.27 93.42 91.26 91.92 92.19 94.63 98.24 94.24 95.79 99.35 100.86 102.57 100 99.07 98.73
9 NGA Nigeria 66.78 66.47 66.48 66.61 66.52 65.89 66.69 66.67 66.91 80.46 78.3 79.9 80.88 79.48 89.46 90.94 89.31 88.41 86.96 86.22 85.73 74.55 76.57 72.01 74.34 77.75 74.58 74.38 72.67 71.82 72.35 74.6 74.33 69.47 73.99 80.97 79.2 79.92 81.96 85.25 90.65 93.67 94.98 99.91 91.51 95.86 100.42 100 100.4 98.94
10 TUN Tunisia 90.58 90.5 90.48 90.55 90.56 90.34 90.58 90.64 90.88 95.07 94.2 94.78 95.07 94.58 97.53 97.78 97.24 97.06 96.65 96.38 96.23 92.85 93.51 92.11 92.82 93.83 92.77 92.74 92.18 91.94 92.13 92.77 92.73 91.14 92.6 94.74 94.18 94.36 94.95 95.92 97.45 98.43 98.75 99.96 97.69 98.96 100.19 100 100.21 99.89
11 ZAF South Africa 96.79 96.63 96.54 96.57 96.78 96.89 96.58 96.6 96.93 97.68 97.73 98.01 98.18 97.8 98.14 98.06 97.76 97.81 97.45 97 96.96 96.25 96.14 96.43 96.54 96.66 96.16 96.11 95.64 95.77 96.13 96.03 95.95 95.48 95.4 95.87 96.05 95.92 96.11 97.18 97.73 98.46 98.86 99.93 98.82 100.22 100.85 100 99.97 99.46
12 ZMB Zambia 98.46 101.68 94.67 96.79 99.69 97.81 89.51 87.97 96.39 95.77 83.13 84.89 82.63 81.15 85.89 84.96 79.64 77.2 78.65 75.88 76.44 74.76 77.5 82.54 84 82.38 79.05 78.69 75.66 78.83 82.53 77.82 78.37 73.48 72.71 75.25 73.43 73.47 75.33 82.71 86.84 99.46 100.03 98.29 92.1 100.23 103.24 100 98.75 97.31
13 ZWE Zimbabwe 98.88 98.72 98.98 99.17 99.6 100.04 99.61 99.89 100.64 100.71 99.94 100.32 100.15 99.23 99.68 99.76 98.68 97.72 97.86 97.59 97.12 96.15 96.75 99.2 99.35 98.24 97.37 96.69 96.01 96.97 97.76 97.12 97.02 95.71 95.91 97.15 96.15 96.37 97.56 98.48 98.69 100.53 101.47 100.06 98.92 100.96 101.71 100 99.59 99.8

Lastly I need to join IMF dataframes to the main dataset, ac. There are two complexities:

  1. reformatting the year columns as a single new ‘year’ column
  2. join on year & cc3 code, since Ivory Coast has different names in ac and export_df.

Wrote a function to transform the IMF data frame into a joinable format:

Code
imf_df_transform <- function(df, col_name){
  
  df = df %>% arrange(cc3)

  # select each grouping of data I need
  just_cpi_df = subset(df, select = -c(cc3, country))
  cpi = unlist(just_cpi_df, use.names=FALSE)
  cc3 = unlist(subset(df, select = c(cc3)), use.names=FALSE)
  country = unlist(subset(df, select = c(country)), use.names=FALSE)
  year = colnames(just_cpi_df)

  row_repeats = length(cpi) / length(cc3)
  col_repeats = length(cpi) / length(year)
  as.date=function(x){
    year_starts = paste(as.character(x), rep("01", length(x)), rep("01", length(x)), sep="-" )
    return(as.Date(year_starts, format="%Y-%m-%d"))
  }

  # reconstruct the joinable df
  df_joinable = data.frame(
    country=rep(country, times=row_repeats),
    cc3=rep(cc3, times=row_repeats),
    year=rep(as.date(year),each=col_repeats)
  )
  df_joinable[col_name] = cpi

  # remove the country column as not needed for join
  df_joinable = subset (df_joinable, select = -country)
  return(df_joinable)
}

Defining new object to distinguish original dataset from the merged and truncated one.

Lastly, I use this function and inner join the export price data to main dataset.

Code
export_df_joinable <- imf_df_transform(export_df, "exp_pi")
htmlTable(head(export_df_joinable, n=10))
cc3 year exp_pi
1 DZA 1965-01-01 59.16
2 AGO 1965-01-01 40.83
3 CAF 1965-01-01 99.9
4 CIV 1965-01-01 92.84
5 EGY 1965-01-01 94
6 KEN 1965-01-01 100.22
7 MUS 1965-01-01 97.15
8 MAR 1965-01-01 100.54
9 NGA 1965-01-01 66.78
10 ZAF 1965-01-01 96.79
Code
ac_imf <- inner_join(ac_imf, export_df_joinable, by = c("cc3", "year"))

NOw repeating the same scraping process for the import price index data:

Merging two import data and then converting into a joinable format.

Code
import_df <- merge(x=import_df_1, y=import_df_2, by=c("cc3", "country"))
htmlTable(import_df)
cc3 country 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 AGO Angola 99.53 99.5 99.47 99.35 99.47 99.49 99.34 99.39 100.11 100.8 100.07 99.89 99.83 99.78 100.04 100.11 99.83 99.47 99.52 99.43 99.22 98.67 98.63 98.73 98.78 98.76 98.48 98.51 98.46 98.59 98.63 98.67 98.56 98.34 98.19 98.36 98.31 98.48 98.59 98.78 98.93 99.26 99.44 99.93 99.48 99.74 100.15 100 99.98 99.81 99.28 99.35
2 CAF Central African Republic 98.57 98.52 98.6 98.55 98.58 98.5 98.43 98.53 99.19 100.24 99.55 99.48 99.32 99.22 99.81 100.06 99.64 99.27 99.23 99.06 98.76 98.1 98.31 98.22 98.44 98.45 98.1 98.17 98.19 98.32 98.42 98.47 98.33 97.8 97.85 98.24 98.08 98.22 98.4 98.58 98.88 99.34 99.48 99.97 99.4 99.73 100.27 100 99.94 99.78 99.02 99.02
3 CIV Côte d'Ivoire 92.85 92.79 92.88 92.75 92.75 92.35 92.18 92.43 93.76 97.7 96.06 96.35 96.63 96.29 98.61 99.09 98.54 97.77 97.28 96.88 96.61 93.45 94.08 92.9 93.42 94.2 93.19 93.19 92.45 92.35 92.59 93.14 92.9 91.49 92.53 94.27 93.56 93.96 94.47 95.5 97.04 98.09 98.45 100.29 98.06 99.15 100.31 100 100.28 99.74 96.74 96.52
4 DZA Algeria 99.56 99.53 99.49 99.37 99.47 99.5 99.25 99.38 100.81 101.62 100.5 100.31 99.92 99.7 100.05 100.08 99.45 98.99 99.12 98.84 98.36 97.75 97.45 97.87 98.12 97.78 97.27 97.43 97.45 97.86 98.09 98.18 97.86 97.16 96.75 97 97.06 97.4 97.54 97.78 97.93 98.67 99.11 99.87 98.92 99.36 100.33 100 99.82 99.53 98.65 98.59
5 EGY Egypt 97.98 97.94 97.81 97.64 97.7 97.69 97.33 97.31 98.94 100.08 98.78 98.84 98.65 98.46 99.09 98.92 98.32 98.07 98.16 97.91 97.38 96.08 95.82 95.95 96.26 96.17 95.59 95.79 95.7 95.84 96.01 96.35 96.01 95.17 95.02 95.71 95.75 96.13 96.45 96.93 97.32 98.14 98.72 99.77 98.41 99.19 100.35 100 99.94 99.52 98.09 97.86
6 KEN Kenya 93.97 93.81 93.8 93.66 93.7 93.67 93.73 93.66 94.24 97.49 96.51 96.6 96.75 96.46 98.32 98.56 98.09 97.62 97.48 97.41 97.01 94.46 94.9 94.23 94.67 95.15 94.44 94.51 94.09 94.15 94.42 94.82 94.73 93.68 94.31 95.54 95.17 95.54 95.99 96.66 97.55 98.37 98.89 99.93 98.25 99.2 100.23 100 99.91 99.55 97.39 97.14
7 MAR Morocco 90.21 90.09 89.98 89.88 89.9 89.76 89.72 89.76 91.14 95.89 94.32 94.62 94.76 94.21 97.1 97.45 96.52 96.03 95.79 95.42 94.77 91.16 91.32 90.47 91.29 92 90.62 90.68 90.27 90.3 90.66 91.42 91.1 89.06 89.95 92.37 91.97 92.27 93.18 94.38 95.87 97.32 98.11 100.12 96.86 98.44 100.5 100 99.91 99.2 95.59 94.92
8 MUS Mauritius 91.84 91.73 91.58 91.34 91.48 91.19 90.77 91.31 93.8 97.96 95.44 96.18 96.53 95.96 98.87 99.19 98.23 97.35 96.81 96.27 95.74 91.79 92.5 91.25 91.69 92.56 90.99 90.97 90.19 90.28 90.5 91.04 90.75 88.78 89.78 92.05 91.32 91.89 92.74 94.05 95.72 97.16 97.91 100.27 96.99 98.91 100.9 100 100.43 99.76 95.76 95.63
9 NGA Nigeria 99.52 99.47 99.44 99.39 99.49 99.41 99.29 99.45 100.01 100.52 100.01 100.02 99.98 99.89 100.16 100.31 100.06 99.81 99.77 99.61 99.49 99.12 99.18 99.13 99.16 99.15 98.88 98.94 98.8 98.9 98.99 99 98.9 98.62 98.61 98.86 98.75 98.85 98.94 99.08 99.34 99.69 99.73 100.05 99.68 99.92 100.19 100 100.07 99.96 99.45 99.5
10 TUN Tunisia 91.69 91.69 91.43 91.33 91.41 91.2 90.98 91.13 92.82 96.68 94.86 95.32 95.48 95.05 97.59 97.6 96.65 96.17 96.19 95.89 95.19 91.99 92.1 91.47 92.13 92.69 91.38 91.41 91.14 91.24 91.6 92.31 91.98 90.05 90.7 92.99 92.66 92.96 93.9 94.92 96.22 97.64 98.44 100.14 96.95 98.53 100.43 100 100.02 99.41 96.02 95.32
11 ZAF South Africa 94.08 94.02 93.98 93.98 93.99 93.88 93.94 93.94 94.31 97.02 96.36 96.67 96.87 96.55 98.33 98.49 98.1 97.82 97.6 97.45 97.25 95.06 95.47 94.75 95.2 95.75 95.05 95.02 94.64 94.58 94.76 95.16 95.09 94.01 94.82 96.14 95.78 95.99 96.42 97.16 98.08 98.74 99.1 100.03 98.47 99.36 100.2 100 100.03 99.74 97.76 97.45
12 ZMB Zambia 94.39 94.44 94.09 94.12 94.24 94.08 93.75 93.6 94.47 97.18 95.84 96.11 96.12 95.77 97.71 97.82 97.19 96.75 96.68 96.4 96.21 93.88 94.37 93.97 94.47 94.92 94.09 94.07 93.52 93.64 94.03 94.23 94.09 92.83 93.48 94.81 94.4 94.68 95.17 96.24 97.26 98.52 98.98 99.9 98.04 99.23 100.3 100 99.93 99.49 97.38 97.02
13 ZWE Zimbabwe 91.59 91.51 91.33 91.25 91.39 91.35 91.24 91.09 92.06 96.07 94.79 95.04 95.02 94.44 97.06 97.28 96.58 95.95 95.87 95.58 95.21 91.83 92.23 91.85 92.52 93.12 92.01 91.92 91.29 91.3 91.72 92.4 92 90.26 91.34 93.32 92.76 93.2 93.94 95.12 96.37 97.78 98.58 99.87 97.24 98.74 100.38 100 99.87 99.2 96.18 95.65
Code
import_df_joinable <- imf_df_transform(import_df, "imp_pi")
htmlTable(head(import_df_joinable, n=10))
cc3 year imp_pi
1 DZA 1965-01-01 99.56
2 AGO 1965-01-01 99.53
3 CAF 1965-01-01 98.57
4 CIV 1965-01-01 92.85
5 EGY 1965-01-01 97.98
6 KEN 1965-01-01 93.97
7 MUS 1965-01-01 91.84
8 MAR 1965-01-01 90.21
9 NGA 1965-01-01 99.52
10 ZAF 1965-01-01 94.08

Attaching import price data to the main data frame.

Code
ac_imf <- inner_join(ac_imf, import_df_joinable, by = c("cc3", "year"))
sample_n(ac_imf, 10) %>% htmlTable()
cc3 country year systemic_crisis exch_usd domestic_debt_in_default sovereign_external_debt_default gdp_weighted_default inflation_rate independence currency_crisis inflation_crisis banking_crisis exp_pi imp_pi
1 EGY Egypt 1987-01-01 0 0.7 0 0 0 25.185 1 0 1 0 95.83 95.82
2 CIV Ivory Coast 1984-01-01 0 486.5 0 1 0 4.281 1 0 0 0 101.51 96.88
3 DZA Algeria 1982-01-01 0 4.6355 0 0 0 6.593 1 0 0 0 87.6 98.99
4 ZMB Zambia 1975-01-01 0 0.0006429 0 0 0 10.16949153 1 0 0 0 83.13 95.84
5 CAF Central African Republic 2009-01-01 0 455.3359711 0 1 0 3.522 1 0 0 0 99.26 99.4
6 KEN Kenya 1995-01-01 1 55.9389 0 1 0 1.554 1 0 0 1 98.09 94.42
7 NGA Nigeria 1996-01-01 0 21.8861 0 0 0 29.292 1 0 1 0 74.6 99
8 MUS Mauritius 1966-01-01 0 4.77862 0 0 0 2.982014118 1 0 0 0 95.56 91.73
9 NGA Nigeria 2007-01-01 0 117.968 0 0 0 5.413 1 0 0 0 94.98 99.73
10 EGY Egypt 1999-01-01 0 3.405 0 0 0 3.745 1 0 0 0 94.97 95.02

Creating new variables

I have now export price index and import price index data included in the data set. I will also create yearly change rates for export&import indices so that I could compare them with inflation rate for my analysis. I will use lag function to calculate change rates.

Code
ac_imf <- ac_imf %>% group_by(cc3) %>% arrange(year) %>%
  mutate(change_epi= round((exp_pi-lag(exp_pi))/lag(exp_pi)*100, digits=2)) %>%
  mutate(change_ipi=round((imp_pi-lag(imp_pi))/lag(imp_pi)*100, digits=2)) %>% 
  ungroup()


ac_imf %>% group_by(country) %>% sample_n(2) %>% arrange(year) %>% htmlTable()
cc3 country year systemic_crisis exch_usd domestic_debt_in_default sovereign_external_debt_default gdp_weighted_default inflation_rate independence currency_crisis inflation_crisis banking_crisis exp_pi imp_pi change_epi change_ipi
1 MAR Morocco 1967-01-01 0 5.0276 0 0 0 -1.126519599 1 0 0 0 100.51 89.98 0.07 -0.12
2 TUN Tunisia 1969-01-01 0 0.52 0 0 0 2.804694188 1 0 0 0 90.56 91.41 0.01 0.09
3 ZWE Zimbabwe 1969-01-01 0 0.00000000000000000000000000714 1 1 0 0.364644862 1 0 0 0 99.6 91.39 0.43 0.15
4 MUS Mauritius 1971-01-01 0 5.22362 0 0 0 1.886196037 1 0 0 0 101 90.77 0.08 -0.46
5 MAR Morocco 1973-01-01 0 3.8851 0 0 0 7.193348119 1 0 0 0 101.5 91.14 0.82 1.54
6 KEN Kenya 1974-01-01 0 7.1429 0 0 0 16.41376144 1 0 0 0 100.01 97.49 0.65 3.45
7 KEN Kenya 1976-01-01 0 8.27 0 0 0 9.527607326 1 1 0 0 100.36 96.6 0.8 0.09
8 DZA Algeria 1979-01-01 0 3.7555 0 0 0 14.60957788 1 0 0 0 85.38 100.05 13.4 0.35
9 CAF Central African Republic 1979-01-01 1 200.9998578 0 0 0 9.210526316 1 0 0 1 101.13 99.81 0.59 0.59
10 CIV Ivory Coast 1980-01-01 0 221 0 0 0 8.81 1 0 0 0 104.6 99.09 -3.34 0.49
11 NGA Nigeria 1980-01-01 0 0.544454729 0 0 0 9.97 1 0 0 0 90.94 100.31 1.65 0.15
12 CIV Ivory Coast 1984-01-01 0 486.5 0 1 0 4.281 1 0 0 0 101.51 96.88 0.92 -0.41
13 CAF Central African Republic 1986-01-01 0 322.7497717 0 1 0 2.411 1 0 0 0 99.45 98.1 0.15 -0.67
14 NGA Nigeria 1986-01-01 0 3.316749585 0 0 0 6.25 1 1 0 0 74.55 99.12 -13.04 -0.37
15 ZMB Zambia 1987-01-01 0 0.0077 0 1 0 47.028 1 0 1 0 77.5 94.37 3.67 0.52
16 ZWE Zimbabwe 1989-01-01 0 0.0000000000000000000000000222 0 0 0 12.8 1 1 0 0 99.35 92.52 0.15 0.73
17 DZA Algeria 1991-01-01 1 21.3919 0 1 0.23 25.9 1 1 1 1 69.81 97.27 -4.33 -0.52
18 EGY Egypt 1992-01-01 0 3.33861 0 0 0 21.142 1 0 1 1 95.1 95.79 -0.2 0.21
19 ZMB Zambia 1994-01-01 0 0.699 0 1 0 54.614 1 1 1 0 78.83 93.64 4.19 0.13
20 ZAF South Africa 1996-01-01 0 4.6825 0 0 0 7.32 1 1 0 0 96.03 95.16 -0.1 0.42
21 AGO Angola 2000-01-01 0 16.81784 1 1 0 325.029 1 1 1 0 62.81 98.36 21.91 0.17
22 ZAF South Africa 2005-01-01 0 6.325 0 0 0 3.393 1 0 0 0 97.73 98.08 0.57 0.95
23 AGO Angola 2007-01-01 0 75.023 0 0 0 12.249 1 0 0 0 89.05 99.44 3.1 0.18
24 MUS Mauritius 2007-01-01 0 28.2162 0 0 0 8.827 1 0 0 0 94.24 97.91 -4.07 0.77
25 TUN Tunisia 2008-01-01 0 1.3099 0 0 0 4.913 1 0 0 0 99.96 100.14 1.23 1.73
26 EGY Egypt 2013-01-01 0 6.94 0 0 0 6.914 1 0 0 0 100.06 99.94 0.06 -0.06

Exploratory Data Analysis

Before producing summary tables, I would like to convert inflation rate and exchange rate values to two-decimal format for consistency with values of other variables in the dataset. Also, it the tables will look more tidy with this way.

Code
ac_imf <- ac_imf %>% 
  mutate(exch_usd = round(exch_usd, digits=2),
         inflation_rate = round(inflation_rate, digits=2)
         )

Summary statistics of the columns

Code
print(dfSummary(ac_imf, 
                varnumbers= FALSE, 
                plain.ascii= FALSE, 
                style= "grid", 
                graph.magnif= 0.80, 
                valid.col= TRUE),
      method= 'render', 
      table.classes= 'table-condensed')

Data Frame Summary

ac_imf

Dimensions: 629 x 17
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
cc3 [factor]
1. DZA
2. AGO
3. CAF
4. CIV
5. EGY
6. KEN
7. MUS
8. MAR
9. NGA
10. ZAF
[ 3 others ]
47(7.5%)
35(5.6%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
49(7.8%)
49(7.8%)
149(23.7%)
629 (100.0%) 0 (0.0%)
country [character]
1. Central African Republic
2. Egypt
3. Ivory Coast
4. Kenya
5. Mauritius
6. Morocco
7. Tunisia
8. Zambia
9. Nigeria
10. South Africa
[ 3 others ]
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
50(7.9%)
49(7.8%)
49(7.8%)
131(20.8%)
629 (100.0%) 0 (0.0%)
year [Date]
min : 1965-01-01
med : 1990-01-01
max : 2014-01-01
range : 49y 0m 0d
50 distinct values 629 (100.0%) 0 (0.0%)
systemic_crisis [integer]
Min : 0
Mean : 0.1
Max : 1
0:550(87.4%)
1:79(12.6%)
629 (100.0%) 0 (0.0%)
exch_usd [numeric]
Mean (sd) : 63 (132.7)
min ≤ med ≤ max:
0 ≤ 5.2 ≤ 744.3
IQR (CV) : 33.7 (2.1)
420 distinct values 629 (100.0%) 0 (0.0%)
domestic_debt_in_default [integer]
Min : 0
Mean : 0.1
Max : 1
0:587(93.3%)
1:42(6.7%)
629 (100.0%) 0 (0.0%)
sovereign_external_debt_default [integer]
Min : 0
Mean : 0.2
Max : 1
0:475(75.5%)
1:154(24.5%)
629 (100.0%) 0 (0.0%)
gdp_weighted_default [numeric]
Mean (sd) : 0 (0)
min ≤ med ≤ max:
0 ≤ 0 ≤ 0.4
IQR (CV) : 0 (6)
0.00:607(96.5%)
0.06:4(0.6%)
0.13:6(1.0%)
0.23:6(1.0%)
0.36:5(0.8%)
0.40:1(0.2%)
629 (100.0%) 0 (0.0%)
inflation_rate [numeric]
Mean (sd) : 35098 (876785.1)
min ≤ med ≤ max:
-8.8 ≤ 8 ≤ 21989695
IQR (CV) : 10.1 (25)
561 distinct values 629 (100.0%) 0 (0.0%)
independence [integer]
Min : 0
Mean : 1
Max : 1
0:5(0.8%)
1:624(99.2%)
629 (100.0%) 0 (0.0%)
currency_crisis [integer]
Mean (sd) : 0.2 (0.4)
min ≤ med ≤ max:
0 ≤ 0 ≤ 2
IQR (CV) : 0 (2.2)
0:516(82.0%)
1:110(17.5%)
2:3(0.5%)
629 (100.0%) 0 (0.0%)
inflation_crisis [integer]
Min : 0
Mean : 0.2
Max : 1
0:531(84.4%)
1:98(15.6%)
629 (100.0%) 0 (0.0%)
banking_crisis [integer]
Min : 0
Mean : 0.1
Max : 1
0:538(85.5%)
1:91(14.5%)
629 (100.0%) 0 (0.0%)
exp_pi [numeric]
Mean (sd) : 92.3 (12.1)
min ≤ med ≤ max:
39.6 ≤ 97.1 ≤ 116.8
IQR (CV) : 9 (0.1)
512 distinct values 629 (100.0%) 0 (0.0%)
imp_pi [numeric]
Mean (sd) : 96.4 (3)
min ≤ med ≤ max:
88.8 ≤ 97.1 ≤ 101.6
IQR (CV) : 4.9 (0)
460 distinct values 629 (100.0%) 0 (0.0%)
change_epi [numeric]
Mean (sd) : 0.4 (4.5)
min ≤ med ≤ max:
-35.4 ≤ 0 ≤ 51.1
IQR (CV) : 1.4 (10.2)
375 distinct values 616 (97.9%) 13 (2.1%)
change_ipi [numeric]
Mean (sd) : 0.1 (1)
min ≤ med ≤ max:
-4.1 ≤ 0 ≤ 5.2
IQR (CV) : 0.8 (9.4)
265 distinct values 616 (97.9%) 13 (2.1%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-12-23

My last form of the dataset is composed from 629 observations and 17 variables. Each observation includes country-year economic indicators. 17 variables consists of: - 2 categorical variable - 1 date variable - 7 numeric variable - 7 binary variable

Univariate analysis

Year

Dataset covers 50 years from 1965 to 2014. Since we have 13 countries, each year should have frequency of 13.

Code
ac_imf %>% mutate(Year= format(year, format="%Y")) %>% 
  group_by(Year) %>% 
  summarise(number_of_obs=n()) %>% 
  arrange(number_of_obs) %>% 
  filter(number_of_obs < 13) %>% 
  htmlTable()
Year number_of_obs
1 1965 11
2 1966 11
3 1967 11
4 2014 11
5 1968 12
6 1969 12
7 1981 12
8 1982 12
9 1983 12
10 1984 12
11 1985 12
12 1986 12
13 1987 12
14 1988 12
15 1989 12
16 1990 12
17 1991 12

So, only for 4 years there is no observation for 2 countries and 13 years have missing information for only 1 country. All other 37 years include data from all countries, which is good.

Country

According to the summary table, 8 of 13 countries have frequency of 50 which is as expected, however 5 countries have less than 50 frequency, which means that there is no observation for those countries for some years.

Code
ac_imf %>% ggplot(aes(x= fct_rev(fct_infreq(country)))) + 
  geom_bar(stat= 'count', alpha= 0.6, width= 0.5) + 
  xlab("Country") +
  ylab("Number of observations") +
  coord_flip() +
  theme_bw()

It looks that Angola has significant amount of missing data with no observation for 15 years. Other countries look good.

Exchange Rate

Let me compare and analyze exchange rate of the countries with box plots.

Code
ac_imf %>% ggplot() + 
  geom_boxplot(aes(x= reorder(country, exch_usd, FUN= median, decreasing= F), 
                   y= exch_usd, fill= country), show.legend= F) +
  xlab("Country") +
  ylab("Exchange Rate") +
  coord_flip() +
  theme_bw()

Median exchange rate of Central African Republic and Ivory Coast are significantly higher than exchange rates of other countries. HOwever, their box plots are not highly right skewed, they disperse evenly over their values. On the other hand, Nigeria and Angola have significantly right skewed box plots, which might be a sign that these countries experienced more currency crises than other countries. Similarly. Zimbabwe have the lowest median exchange rate but it has huge outlier points, which poses a bad evidence for Zimbabwe currency.

Currency Crises

currency_crisis variable should only consist of 0 and 1 values but When we look at the summary table, we see that there are three cells with value of 2. I think they are wrong entries and their true value is 1. So let me fix them first.

Code
ac_imf$currency_crisis[ac_imf$currency_crisis > 1] <- 1L

Now I will create a table that shows how many currency crisis observed by each country over 50 years.

Code
ac_imf %>% group_by(country) %>% 
  summarise(total_currency_crises= sum(currency_crisis)) %>% 
  arrange(desc(total_currency_crises)) %>% 
  htmlTable()
country total_currency_crises
1 Zimbabwe 21
2 Angola 19
3 Zambia 18
4 South Africa 12
5 Nigeria 10
6 Kenya 9
7 Algeria 6
8 Egypt 6
9 Mauritius 5
10 Tunisia 4
11 Central African Republic 1
12 Ivory Coast 1
13 Morocco 1

As I expected in previous part, Zimbabwe and Angola had currency crises more often than other countries. Even though Central African Republic and Ivory Coast have higher exchange rate generally, they experienced a currency crisis only once.

GDP weighted default

According to the summary table, gdp weighted default is 0 majority of the time. When it is not 0, it is between 0 and 1 because it gives debt default and gdp ratio of the country. I will rescale gdp weighted default values so it shows percentage of gdp and then visualize the distribution by histogram.

Code
ac_imf <- ac_imf %>% 
  mutate(gdp_weighted_default= gdp_weighted_default*100)

Instead of distribution for single gdp weighted default values, I would like to know distribution of aggregate values of 13 countries for each year.

Code
ac_imf %>% group_by(year) %>% 
  summarise(cum_gdp_wght_def=sum(gdp_weighted_default)) %>% 
  ggplot(aes(x=cum_gdp_wght_def, y = after_stat(density))) + 
  geom_histogram(bins = 10, fill="blue", alpha=0.6) + 
  scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) +
  xlab("Yearly cumulative gdp weighted default") +
  theme_bw()

Also let me find out which countries and how many times had positive gdp weighted default as well as their average.

Code
ac_imf %>% filter(gdp_weighted_default > 0) %>% 
  group_by(country) %>% 
  summarise(
    number_of_years=n(), 
    mean_percentage=mean(gdp_weighted_default)) %>% 
  arrange(desc(number_of_years), desc(mean_percentage)) %>% 
  htmlTable()
country number_of_years mean_percentage
1 Algeria 6 23
2 Morocco 6 13
3 South Africa 5 36
4 Tunisia 4 6
5 Egypt 1 40

The worst case is obviously for South Africa with 5 years and with average default of 36% of their gdp.

Independence

Code
ac_imf %>% filter(independence < 1) %>% 
  select(country, year, independence) %>% 
  mutate(year=format(year, format="%Y"))
# A tibble: 5 × 3
  country year  independence
  <chr>   <chr>        <int>
1 Angola  1970             0
2 Angola  1971             0
3 Angola  1972             0
4 Angola  1973             0
5 Angola  1974             0

Over the years that covered by transformed dataset, only Angola was not independent from 1965 to 1975. So, it doesn’t enable us to evaluate the effect of independence on economic crisis.

Inflation Rate

According to the summary table, inflation rate have some extreme values. Let me first find them.

Code
ac_imf %>% select(country, inflation_rate) %>% 
  group_by(country) %>% 
  summarise_at(.vars="inflation_rate", 
               .funs= list(minimum=min, 
                           median=median, 
                           maximum=max)) %>% 
  htmlTable()
country minimum median maximum
1 Algeria 0.3 6.31 31.7
2 Angola 5.78 46.71 4146.01
3 Central African Republic -8.82 3.725 27.7
4 Egypt 0 10 25.18
5 Ivory Coast -1.07 4.25 25.96
6 Kenya -0.13 8.62 45.98
7 Mauritius 0.01 6.47 33.04
8 Morocco -1.13 3.065 15.98
9 Nigeria -4.55 11.77 72.73
10 South Africa 1.39 8.71 18.75
11 Tunisia 1.44 4.66 13.67
12 Zambia 2.52 17.08 183.26
13 Zimbabwe -7.67 12.55 21989695.22

As we can see from the table that all countries’ median inflation rate is around normal/acceptable values, however, Angola has a very extreme maximum value (4146%). In addition to Angola, Zimbabwe has an astronomic maximum inflation rate with 21989695%. It is either a recording error or there is a very very exceptional thing happened in Zimbabwe for a year because Zimbabwe has median inflation rate of only 12%.

Code
ac_imf %>% select(year, country, inflation_rate) %>% 
  filter(inflation_rate>1000) %>% 
  htmlTable()
year country inflation_rate
1 1993-01-01 Angola 1379.48
2 1995-01-01 Angola 2672.23
3 1996-01-01 Angola 4146.01
4 2006-01-01 Zimbabwe 1281.11
5 2007-01-01 Zimbabwe 66279.89
6 2008-01-01 Zimbabwe 21989695.22

So there are 6 exceptional observations with higher than 1000% inflation rate. To deal with these cases, I will fix them to 1000% for health of my further analysis.

Code
ac_imf$inflation_rate[ac_imf$inflation_rate > 1000] <- 1000

Now I can plot the distribution of inflation rates over countries.

Code
ac_imf %>% filter(inflation_rate < 100) %>% ggplot() + 
  geom_violin(aes(x= reorder(country, inflation_rate, FUN= median, decreasing= F), 
                   y= inflation_rate, fill= country), show.legend= F, alpha=0.8) +
  xlab("Country") +
  ylab("Inflation Rate") +
  coord_flip() +
  theme_bw()

Bivariate and multivarite analysis

Economic crises

My main focus of analysis is systemic crisis. According to the summary table, 79 times systemic crisis were observed over 50 years. In this perspective, the first thing I would like to know is how the distribution of systemic crises is between countries in terms of percentage of occurrence. Also I expect that the countries that had systemic crises also experienced banking crises. According to the summary table, 91 times banking crises observed. To verify my hypothesis I will visualise countries’ systemic crises and banking crises experiences over pie charts

Code
systemic_pie <- ac_imf %>% filter(systemic_crisis==1) %>% select(country)
PieChart(country, 
         hole = 0, 
         values = "%", 
         data = systemic_pie, 
         fill = brewer.pal(n = 12, name = "Paired"), 
         main = "Systemic Crises", 
         quiet = TRUE)

Code
banking_pie <- ac_imf %>% filter(banking_crisis==1) %>% select(country)
PieChart(country, 
         hole = 0, 
         values = "%", 
         data = banking_pie, 
         fill = "rainbow", 
         main = "Banking Crises", 
         quiet = TRUE)

As I expected, countries which had systemic crises more often experienced banking crises more often as well such as Central African Republic, Zimbabwe and Nigeria, These pie charts above also revealed another thing that all of the 13 countries observed banking crises at least once whereas only 10 countries observed systemic crises. So, there are 3 countries that had banking crisis but no systemic crisis.

Let me find out which countries they are.

Code
ac_imf %>% 
  group_by(country) %>% 
  summarize(total_sys_crs=sum(systemic_crisis)) %>% 
  arrange(total_sys_crs) %>% 
  ungroup() %>% 
  slice_head(n=3) %>% 
  htmlTable()
country total_sys_crs
1 Angola 0
2 Mauritius 0
3 South Africa 0

These countries did not have systemic crises and also they had banking crises much less frequent than other countries, which makes me think that banking crises are related to systemic crises. To verify that, I will plot timelines for systemic crises and banking crises for each country.

Code
ac_imf %>% select(country, year, systemic_crisis, banking_crisis) %>% 
  pivot_longer(cols= ends_with("crisis"), 
               names_to= "Type_of_crisis", 
               values_to= "occurance") %>% 
  ggplot() +
  geom_jitter(aes(year, occurance, color= Type_of_crisis, shape=Type_of_crisis), width = 0.2, height = 0.2) + 
  geom_line(aes(year, occurance, color= Type_of_crisis)) +
  scale_y_continuous(name = "Occurance of Crisis", breaks = c(0,1)) +
  facet_wrap(vars(country), ncol = 2) +
  theme_minimal()

Plots for Central African Republic, Zimbabwe, Kenya, Algeria, Ivory Coast, Tunusia, Nigeria, Zambia and Morocca pose big evidence that there is very close relationship between banking crises and systemic crises. Indeed, it looks that whenever a country had systemic crisis it had banking crisis as well. The reverse of that not be true though. From the plots above, we see that Angola and South Africa had banking crises for few times but they did not experience systemic crises meanwhile. However,

Besides that, the plots above show that banking and sytemic crises became more frequent after 1980s. I will now investigate it and also include currency and inflation crises to my analysis.

Code
ac_imf %>% select(year, systemic_crisis, banking_crisis, inflation_crisis, currency_crisis) %>% 
  pivot_longer(cols= ends_with("crisis"), 
               names_to= "Type_of_crisis", 
               values_to= "occurance") %>% 
  group_by(year, Type_of_crisis) %>%
  summarize(Number_of_Occurance=sum(occurance), .groups= "drop") %>%
  ggplot(aes(year, Number_of_Occurance)) +
  geom_col(aes(fill=Type_of_crisis), color="white", width= 700, alpha=0.8)

So, the graph tells us that beginning with late 1980s more African countries started to experience crises more often. This trend lasted until 2000s. 1995 is the peak year in terms of overall number of economics crises. Also, systemic crises took place more often during this period.

Domestic & Sovereign external debt in defaults

Now I would like to visualize domestic debt in default and sovereign external debt in default cases varied over 50 year in African countries.

Code
ac_imf %>% mutate(Year= as.numeric(format(year, format="%Y"))) %>% 
  select(Year, contains("debt")) %>% 
  pivot_longer(cols= contains("debt"), 
               names_to= "Debts_in_Deafult", 
               values_to= "occurance") %>% 
  filter(occurance==1) %>% ggplot(aes(x= Year, fill= Debts_in_Deafult)) + 
  geom_histogram(position= "dodge2", alpha = 0.8, bins=15) +
  geom_freqpoly(aes(Year, color=Debts_in_Deafult), size=1, binwidth = 3.5) +
  scale_x_continuous(breaks=c(1960,1970,1980,1990,2000,2010)) +
  theme_bw()

The graph shows that domestic debt in default cases have almost the same trend as economic crises we analyzed in previous part. They started to become more often after mid 80s but they lasted in 2000s as well. On the other hand, there is no specific trend for domestic debt in default cases. So, we can result that in years of systemic crises, debts in default are prevalent as well.

Export & Import Price Index and Inflation Rate

Let me first visualize how export and import price indices change in African countries over 50 years.

Code
ac_imf %>%  dplyr::rename("Export_Price_Index"="exp_pi", "Import_Price_Index"="imp_pi") %>% 
  pivot_longer(cols= c("Export_Price_Index", "Import_Price_Index"), 
                        names_to= "Price_Index",
                        values_to= "Rate") %>% 
  ggplot(aes(x=year, y=Rate)) +
  geom_point(aes(color=Price_Index)) +
  geom_line(aes(color=Price_Index)) +
  guides(color= guide_legend(title = NULL, override.aes= list(size= 3))) +
  scale_y_continuous(name= "Export & Import Price Indices") +
  facet_wrap(vars(country), ncol = 2) +
  theme_minimal()

From the plots, we can see that Algeria, Angola, Nigeria, Zambia had significant drop, Ivory Coast and Tunusia had moderate drop in their export price indices during 80s and 90s, which might cause trade deficits in these countries and trigger other macroeconomic issues.

I also expect that when the gap between Import and Export price indices of a country gets larger, it is more probable that the inflation rate in that country goes up. I would like to test my hypothesis in these African countries. To get better a fit and visualization, I will put an upper limit on inflation rate.

Code
ac_imf %>% filter(inflation_rate <500) %>% mutate(Import_Export= imp_pi/exp_pi*100) %>% filter(Import_Export <200) %>% 
  ggplot(aes(x=Import_Export, y=inflation_rate)) +
  xlab("Import and Export Price Index Ratio") +
  ylab("Inflation Rate") +
  geom_point() +
  geom_smooth(se= FALSE) + 
  theme_bw()

Correlation Between Economic Indicators

Lastly I would like to explore the correlation between the economic indicators of African countries I got in the dataset.

Code
ac_imf %>% select(systemic_crisis, 
                  banking_crisis, 
                  inflation_crisis, 
                  currency_crisis, 
                  inflation_rate, 
                  exch_usd, 
                  domestic_debt_in_default, 
                  sovereign_external_debt_default, 
                  exp_pi, 
                  imp_pi,
                  gdp_weighted_default) %>% 
  dplyr::rename("Systemic Crisis"="systemic_crisis",
                "Banking Crisis"="banking_crisis",
                "Inflation Crisis"="inflation_crisis",
                "Currency Crisis"="currency_crisis",
                "Inflation Rate"="inflation_rate",
                "Exchange Rate"="exch_usd",
                "Domestic Debt"="domestic_debt_in_default",
                "External Debt"="sovereign_external_debt_default",
                "GDP Weighted Default"="gdp_weighted_default",
                "Export Price Index"="exp_pi",
                "Import Price Index"="imp_pi"
                ) %>% 
  cor() %>% 
  corrplot(method="circle", tl.srt= 60)

Thus, the biggest positive correlation is between the banking crises and systemic crises , which is I already pointed in my previous analysis. There is significant positive relationship between inflation rate and domestic debt in default. A new exploration I got form the correlation plot is that there is significant negative correlation between export price index and inflation crisis. It makes sense because these African countries depend on their exports in terms of revenue. If export price index gets lower, they face trade deficits and, as a result of that, their economies get into trouble.

Conclusion and Considerations for Further Analysis

Economic crises in 13 African countries covered in the dataset basically started after 1960s. Additionally, there is not much data collected before 1960s regarding these countries. Since it would not be reliable to interpret the scarce economic indicators older than 50-60 years old, I focused on the data collected after 1960s in my project. The core finding of the project is the strong relationsihp between the systemic crisis and banking crisis. If a country experienced a banking crisis, then their economy got usually through a systemic crisis as well. Most often, they happened at the same time. Besides that, these African countries were exposed to economic crises more often in the period between mid-1980s and 2000s. A possible explanation for that is as following: decline in global commodity prices such as Oil and petroleum products and other agricultural commodities in 1980s impacted the trade/external accounts of these countries because they are heavily depended on import revenues of natural resources. Due to limited diversification of their exports and having low capital inflows, fluctuations in global commodity prices had strong negative impact on their domestic economy.

My considerations for further studies would be:

  • More variables should be included such as interest rates, trade deficit, government deficit, growth rate etc.
  • More countries should be included to enable to make more reliable interpretations.

References:

  • Kaggle dataset: https://www.kaggle.com/datasets/chirin/africa-economic-banking-and-systemic-crisis-data
  • Commodity of Terms of Trade: Commodity Import/Export Price Index, Individual Commodities Weighted by Ratio of Imports to GDP Historical, Annual (1965-present), Fixed Weights, Index (2012=100): https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851
  • R for Data Science: https://r4ds.had.co.nz/index.html
  • https://r-graph-gallery.com/index.html
  • ggplot2: Elegant Graphics for Data Analysis: https://ggplot2-book.org/
  • DACSS 601 Class resources: https://github.com/DACSS/601_Fall_2022
Source Code
---
title: "Final Project"
description: "An Explanation to African Systemic Crises"
author: "Said Arslan"
date: "`r Sys.Date()`"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
output: distill::distill_article
---

```{r}
library(tidyverse)
library(countrycode)
library(summarytools)
library(htmlTable)
library(lessR)
library(RColorBrewer)
library(corrplot)
load("Final_project- Said Arslan.RData")

knitr::opts_chunk$set(echo = TRUE, warning=FALSE)
```


# Introduction

The primary data I used for this project is extracted from Kaggle: https://www.kaggle.com/chirin/africa-economic-banking-and-systemic-crisis-data. The dataset is a derivative of Reinhart et al’s Global Crises dataset. It was downloaded from: https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx 
Non-African countries were filtered out from the dataset. And, variables containing notes or sparse data were removed. Years with no data were also filtered out. 

The dataset specifically focuses on the Banking, Debt, Financial, Inflation and Systemic Crises that occurred, from 1860 to 2014, in 13 African countries, including: Algeria, Angola, Central African Republic, Ivory Coast, Egypt, Kenya, Mauritius, Morocco, Nigeria, South Africa, Tunisia, Zambia and Zimbabwe. Each country-year observation (row) includes macroeconomic indicators of the country.
I also web scraped additional import and export price index data for these countries, from Commodity of Terms of Trade Database of IMF: https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851, which I think will play essential role in my analysis.

The main purpose of the project is to understand what patterns and trends are associated with mainly systemic crises and other type of economic crises as well as how they differ by country.


# Import and transform into tidy data


## Read the primary dataset

```{r}
# ac <-
#    read_csv("C:/Users/saida/Documents/african_crises.csv",
#     col_types = cols(
#         case = col_integer(),
#         cc3 = col_factor(ordered = TRUE),
#         country = col_character(),
#         year = col_date(format = "%Y"),
#         exch_usd = col_double(),
#         gdp_weighted_default = col_double(),
#         inflation_annual_cpi = col_double(),
#         systemic_crisis = col_integer(),
#         domestic_debt_in_default = col_integer(),
#         sovereign_external_debt_default = col_integer(),
#         independence = col_integer(),
#         currency_crises = col_integer(),
#         inflation_crises = col_integer(),
#         banking_crisis = col_character()
#     )
#   )
# 
# sample_n(ac, 10) %>% htmlTable()

```

In the main dataset, each variable has its own column, each observation has its own row, and each value has its own cell.


### The variables in the primary dataset

1. `case` is a factor variable that identifies the index for each country from the original, global dataset.  

2. `cc3` is a factor variable that identifies each country with a three letter code.

3. `country` is a character variable that names each country.

4. `year` is a date variable that identifies the year of the observation.

5. `systemic_crisis` is a binary variable that identifies whether a systemic crisis is observed in that year or not. Systemic crisis means the nationwide banking crisis which its impact spreads to the other sectors.

6. `exch_usd` is a numeric variable that gives exchange rate of USD to the country’s currency. For a country, the lower the exchange rate the more valuable its currency.  

7. `domestic_debt_in_default` is a binary variable that shows if the government failed in repaying its domestic debts or not.

8. `sovereign_external_debt_default` is a binary variable that identifies if sovereign external debt default occurred or not. It is a failure of a government to honor some or all of its debt obligations to the external lenders.

9. `gdp_weighted_default` is a numeric variable which gives the ratio of debt to gdp when debt default is observed.

10. `inflation_annual_cpi` is numeric variable that shows inflation rate based on consumer price index.

11. `independence` is a binary variable that indicates if the country is independent or not as of observation year.

12. `currency_crises` is a binary variable that identifies currency crisis occured or not in the observation year.

13. `inflation_crises` is a binary variable that identifies inflation crisis occured or not in the observation year.

14. `banking_crisis` is a binary variable that identifies banking crisis occured or not in the observation year.



For consistency I will first change column names with "crises" to "crisis" as well as "inflation_annual_cpi" to "inflation_rate. Second, I will remove case number that is not needed in my analysis. 
Then, I will recode "banking_crisis" variable which is differently coded in the dataset than the other crisis indicator variables.

```{r}
cat(unique(ac$banking_crisis), sep = "\n")

```

```{r}
# ac <- ac %>% dplyr::rename("inflation_crisis" = "inflation_crises", 
#                            "currency_crisis" = "currency_crises",
#                            "inflation_rate" = "inflation_annual_cpi") %>% 
#   select(-case) %>% 
#   mutate(banking_crisis = case_when(
#          banking_crisis == "crisis" ~ 1L,
#          banking_crisis == "no_crisis" ~ 0L)
#   )

```



## Initial analysis

I would like to supplement different economic indicators to the original dataset to better analyze the economic crises, however, there is no financial and economic data available for African countries that goes back to the 1800s or early 1900s.

That's why, I wanted to first examine timeline of crisis in these African countries along with dates of their gaining independency.


```{r}
#I'll create a timeline that shows independence of each country and include point for years of crises in the plot

filt_crs <- ac %>% 
  filter(if_any(ends_with("crisis"), ~ .x == 1)) %>%
  pivot_longer(cols = ends_with("crisis"), 
               names_to = "type_of_crisis",
               values_to = "crisis_or_not") %>% 
  filter(crisis_or_not == 1)


ac %>% ggplot(aes(year, independence)) + 
  geom_line() +
  geom_area(fill = "#FFDB6D", alpha = 0.3) +
  geom_point(data = filt_crs, 
             mapping = aes(year, crisis_or_not, color=type_of_crisis)) +
  guides(color=guide_legend("Types of Crisis", override.aes = list(size = 3))) +
  scale_y_continuous(name = "Independence", breaks = c(0,1)) +
  scale_x_date(name = "Year", 
               date_breaks = "20 years", 
               date_labels = "%Y", 
               limits = c(as.Date("1860-01-01"), NA), 
               expand=c(0,0)) +
  facet_wrap(vars(country), ncol = 2) +
  theme_light()

```

The plots above reveal crucial situation for my further analysis. As we can see, most of the economic crisis in African countries actually took place after they gained independence and most of the countries got their independency around 1960s. Especially, all the systemic crises are after1960s. 
Therefore, I decided to focus on the data after 60s in analysis. Also, it appears that including additional economic indicators, which are usually start from 60s, to the main dataset would comply with this perspective 


## Merging Commodity Import & Export Price Index obtained by web scraping with the main dataset

In order to expand the features of the main dataset, I scrape additional data provided by the Commodity of Terms of Trade Database of International Monetary Fund (IMF). While the IMF provides many formats including  excel and csv file formats but I instead wanted to invest time on scraping the rendered html using their online table viewer: https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851. 

Using the viewer I selected the desired data and used inspect in Chrome to view the rendered html for the current page. Then copied the div representing the entire table in html and pasted the text into local files.

First I wrote a function to convert the html, into a data frame matching the online table viewer.
```{r}
imf_html_to_df <-  function (html_filename){

  # getting column labels by looking at the custom filename
  filename_pieces = str_split(html_filename, "_")[[1]]

  # read all the html in the file
  html_str = readLines(html_filename)

  # function returning the elements matched by the regex
  html_to_byRowList = function(full_file_str){
    regex_exp = "<div class=\"PPTSCellConText.*?>(.*?)<"
    result = stringr::str_match_all(full_file_str, regex_exp)
    return(result[[1]][, 2])
  }

  # all elements in the df by row
  elements = html_to_byRowList(html_str)

  columns = c("country", seq(as.integer(filename_pieces[3]), as.integer(filename_pieces[4]), by=1))
  num_columns = length(columns)
  schema = matrix(ncol = num_columns, nrow = 0)
  df = data.frame(schema)
  colnames(df) = columns

  # grab elements, ignoring column headers
  elements = elements[num_columns:length(elements)]

  # populate df iterating through element by df row
  start = 1
  while(start + num_columns - 1 <= length(elements)){
    curr = elements[start:(start+num_columns-1)]
    curr = do.call(c, list(list(curr[1]), as.double(curr[2:(length(curr))])))
    df[nrow(df) + 1,] = curr
    start = start + num_columns
  }

  # add cc3 labels used in joining later
  df$cc3 = as_factor(countrycode(df$country, origin = 'country.name', destination = 'iso3c'))
  return(df)
}
```


Applying the transformer function on html export price index files. They are in 20 year grouping since at most 20 columns render.
```{r, include=FALSE}

# export_df_1 <- imf_html_to_df("C:/Users/saida/Documents/IMF_export_1965_1988_.html")
# export_df_2 <- imf_html_to_df("C:/Users/saida/Documents/IMF_export_1989_2014_.html")

```


Joining them together.
```{r}
export_df <-  merge(x=export_df_1, y=export_df_2, by=c("cc3", "country")) %>% 
  as_tibble()
htmlTable(export_df)

```

Lastly I need to join IMF dataframes to the main dataset, `ac.` There are two complexities:

1. reformatting the year columns as a single new 'year' column
2. join on year & cc3 code, since Ivory Coast has different names in `ac` and `export_df`.

Wrote a function to transform the IMF data frame into a joinable format:
```{r}
imf_df_transform <- function(df, col_name){
  
  df = df %>% arrange(cc3)

  # select each grouping of data I need
  just_cpi_df = subset(df, select = -c(cc3, country))
  cpi = unlist(just_cpi_df, use.names=FALSE)
  cc3 = unlist(subset(df, select = c(cc3)), use.names=FALSE)
  country = unlist(subset(df, select = c(country)), use.names=FALSE)
  year = colnames(just_cpi_df)

  row_repeats = length(cpi) / length(cc3)
  col_repeats = length(cpi) / length(year)
  as.date=function(x){
    year_starts = paste(as.character(x), rep("01", length(x)), rep("01", length(x)), sep="-" )
    return(as.Date(year_starts, format="%Y-%m-%d"))
  }

  # reconstruct the joinable df
  df_joinable = data.frame(
    country=rep(country, times=row_repeats),
    cc3=rep(cc3, times=row_repeats),
    year=rep(as.date(year),each=col_repeats)
  )
  df_joinable[col_name] = cpi

  # remove the country column as not needed for join
  df_joinable = subset (df_joinable, select = -country)
  return(df_joinable)
}

```


Defining new object to distinguish original dataset from the merged and truncated one. 
```{r, include=FALSE}

ac_imf <- ac
```


Lastly, I use this function and inner join the export price data to main dataset.
```{r}
export_df_joinable <- imf_df_transform(export_df, "exp_pi")
htmlTable(head(export_df_joinable, n=10))

```


```{r}
ac_imf <- inner_join(ac_imf, export_df_joinable, by = c("cc3", "year"))

```


NOw repeating the same scraping process for the import price index data:
```{r, include=FALSE}
# 
# import_df_1 <- imf_html_to_df("C:/Users/saida/Documents/IMF_import_1965_1990_.html")
# import_df_2 <- imf_html_to_df("C:/Users/saida/Documents/IMF_import_1991_2016_.html")

```


Merging two import data and then converting into a joinable format.
```{r}
import_df <- merge(x=import_df_1, y=import_df_2, by=c("cc3", "country"))
htmlTable(import_df)

import_df_joinable <- imf_df_transform(import_df, "imp_pi")
htmlTable(head(import_df_joinable, n=10))

```

Attaching import price data to the main data frame.
```{r}
ac_imf <- inner_join(ac_imf, import_df_joinable, by = c("cc3", "year"))
sample_n(ac_imf, 10) %>% htmlTable()

```



## Creating new variables

I have now export price index and import price index data included in the data set. I will also create yearly change rates for export&import indices so that I could compare them with inflation rate for my analysis. I will use `lag` function to calculate change rates.

```{r}
ac_imf <- ac_imf %>% group_by(cc3) %>% arrange(year) %>%
  mutate(change_epi= round((exp_pi-lag(exp_pi))/lag(exp_pi)*100, digits=2)) %>%
  mutate(change_ipi=round((imp_pi-lag(imp_pi))/lag(imp_pi)*100, digits=2)) %>% 
  ungroup()


ac_imf %>% group_by(country) %>% sample_n(2) %>% arrange(year) %>% htmlTable()

```



# Exploratory Data Analysis

Before producing summary tables, I would like to convert inflation rate and exchange rate values to two-decimal format for consistency with values of other variables in the dataset. Also, it the tables will look more tidy with this way.
```{r}
ac_imf <- ac_imf %>% 
  mutate(exch_usd = round(exch_usd, digits=2),
         inflation_rate = round(inflation_rate, digits=2)
         )

```

## Summary statistics of the columns
```{r}
print(dfSummary(ac_imf, 
                varnumbers= FALSE, 
                plain.ascii= FALSE, 
                style= "grid", 
                graph.magnif= 0.80, 
                valid.col= TRUE),
      method= 'render', 
      table.classes= 'table-condensed')

```

My last form of the dataset is composed from  629 observations and 17 variables. Each observation includes country-year economic indicators. 17 variables consists of: 
- 2 categorical variable
- 1 date variable
- 7 numeric variable
- 7 binary variable

## Univariate analysis

### Year

Dataset covers 50 years from 1965 to 2014. Since we have 13 countries, each year should have frequency of 13.

```{r}
ac_imf %>% mutate(Year= format(year, format="%Y")) %>% 
  group_by(Year) %>% 
  summarise(number_of_obs=n()) %>% 
  arrange(number_of_obs) %>% 
  filter(number_of_obs < 13) %>% 
  htmlTable()

```
So, only for 4 years there is no observation for 2 countries and 13 years have missing information for only 1 country. All other 37 years include data from all countries, which is good.


### Country

According to the summary table, 8 of 13 countries have frequency of 50 which is as expected, however 5 countries have less than 50 frequency, which means that there is no observation for those countries for some years.

```{r}
ac_imf %>% ggplot(aes(x= fct_rev(fct_infreq(country)))) + 
  geom_bar(stat= 'count', alpha= 0.6, width= 0.5) + 
  xlab("Country") +
  ylab("Number of observations") +
  coord_flip() +
  theme_bw()

```

It looks that Angola has significant amount of missing data with no observation for 15 years. Other countries look good.


### Exchange Rate

Let me compare and analyze exchange rate of the countries with box plots.
```{r}
ac_imf %>% ggplot() + 
  geom_boxplot(aes(x= reorder(country, exch_usd, FUN= median, decreasing= F), 
                   y= exch_usd, fill= country), show.legend= F) +
  xlab("Country") +
  ylab("Exchange Rate") +
  coord_flip() +
  theme_bw()

```
Median exchange rate of Central African Republic and Ivory Coast are significantly higher than exchange rates of other countries. HOwever, their box plots are not highly right skewed, they disperse evenly over their values. On the other hand, Nigeria and Angola have significantly right skewed box plots, which might be a sign that these countries experienced more currency crises than other countries. Similarly. Zimbabwe have the lowest median exchange rate but it has huge outlier points, which poses a bad evidence for Zimbabwe currency.


### Currency Crises

`currency_crisis` variable should only consist of 0 and 1 values but When we look at the summary table, we see that there are three cells with value of 2. I think they are wrong entries and their true value is 1. So let me fix them first.

```{r}
ac_imf$currency_crisis[ac_imf$currency_crisis > 1] <- 1L

```


Now I will create a table that shows how many currency crisis observed by each country over 50 years.
```{r}
ac_imf %>% group_by(country) %>% 
  summarise(total_currency_crises= sum(currency_crisis)) %>% 
  arrange(desc(total_currency_crises)) %>% 
  htmlTable()

```

As I expected in previous part, Zimbabwe and Angola had currency crises more often than other countries. Even though  Central African Republic and Ivory Coast have higher exchange rate generally, they experienced a currency crisis only once.


### GDP weighted default

According to the summary table, gdp weighted default is 0 majority of the time. When it is not 0, it is between  0 and 1 because it gives debt default and gdp ratio of the country. I will rescale gdp weighted default values so it shows percentage of gdp and then visualize the distribution by histogram. 

```{r}
ac_imf <- ac_imf %>% 
  mutate(gdp_weighted_default= gdp_weighted_default*100)
  
```


Instead of distribution for single gdp weighted default values, I would like to know distribution of aggregate values of 13 countries for each year.
```{r}
ac_imf %>% group_by(year) %>% 
  summarise(cum_gdp_wght_def=sum(gdp_weighted_default)) %>% 
  ggplot(aes(x=cum_gdp_wght_def, y = after_stat(density))) + 
  geom_histogram(bins = 10, fill="blue", alpha=0.6) + 
  scale_x_continuous(breaks=c(0,10,20,30,40,50,60)) +
  xlab("Yearly cumulative gdp weighted default") +
  theme_bw()

```

Also let me find out which countries and how many times had positive gdp weighted default as well as their average.
```{r}
ac_imf %>% filter(gdp_weighted_default > 0) %>% 
  group_by(country) %>% 
  summarise(
    number_of_years=n(), 
    mean_percentage=mean(gdp_weighted_default)) %>% 
  arrange(desc(number_of_years), desc(mean_percentage)) %>% 
  htmlTable()
```

The worst case is obviously for South Africa with 5 years and with average default of 36% of their gdp.

### Independence

```{r}
ac_imf %>% filter(independence < 1) %>% 
  select(country, year, independence) %>% 
  mutate(year=format(year, format="%Y"))

```
Over the years that covered by transformed dataset, only Angola was not independent from 1965 to 1975. So, it doesn't enable us to evaluate the effect of independence on economic crisis.


### Inflation Rate

According to the summary table, inflation rate have some extreme values. Let me first find them.

```{r}
ac_imf %>% select(country, inflation_rate) %>% 
  group_by(country) %>% 
  summarise_at(.vars="inflation_rate", 
               .funs= list(minimum=min, 
                           median=median, 
                           maximum=max)) %>% 
  htmlTable()

```

As we can see from the table that all countries' median inflation rate is around normal/acceptable values, however, Angola has a very extreme maximum value (4146%). In addition to Angola, Zimbabwe has an astronomic maximum inflation rate with 21989695%. It is either a recording error or there is a very very exceptional thing happened in Zimbabwe for a year because Zimbabwe has median inflation rate of only 12%.

```{r}
ac_imf %>% select(year, country, inflation_rate) %>% 
  filter(inflation_rate>1000) %>% 
  htmlTable()

```

So there are 6 exceptional observations with higher than 1000% inflation rate. To deal with these cases, I will fix them to 1000% for health of my further analysis.

```{r}
ac_imf$inflation_rate[ac_imf$inflation_rate > 1000] <- 1000

```


Now I can plot the distribution of inflation rates over countries.
```{r}
ac_imf %>% filter(inflation_rate < 100) %>% ggplot() + 
  geom_violin(aes(x= reorder(country, inflation_rate, FUN= median, decreasing= F), 
                   y= inflation_rate, fill= country), show.legend= F, alpha=0.8) +
  xlab("Country") +
  ylab("Inflation Rate") +
  coord_flip() +
  theme_bw()

```


## Bivariate and multivarite analysis


### Economic crises

My main focus of analysis is **systemic crisis**. According to the summary table, 79 times systemic crisis were observed over 50 years. In this perspective, the first thing I would like to know is how the distribution of systemic crises is between countries in terms of percentage of occurrence. Also I expect that the countries that had systemic crises also experienced banking crises. According to the summary table, 91 times banking crises observed. To verify my hypothesis I will visualise countries' systemic crises and banking crises experiences over pie charts 

```{r}
systemic_pie <- ac_imf %>% filter(systemic_crisis==1) %>% select(country)
PieChart(country, 
         hole = 0, 
         values = "%", 
         data = systemic_pie, 
         fill = brewer.pal(n = 12, name = "Paired"), 
         main = "Systemic Crises", 
         quiet = TRUE)

```


```{r}
banking_pie <- ac_imf %>% filter(banking_crisis==1) %>% select(country)
PieChart(country, 
         hole = 0, 
         values = "%", 
         data = banking_pie, 
         fill = "rainbow", 
         main = "Banking Crises", 
         quiet = TRUE)

```

As I expected, countries which had systemic crises more often experienced banking crises more often as well such as Central African Republic, Zimbabwe and Nigeria, These pie charts above also revealed another thing that all of the 13 countries observed banking crises at least once whereas only 10 countries observed systemic crises. So, there are 3 countries that had banking crisis but no systemic crisis.

Let me find out which countries they are.
```{r}
ac_imf %>% 
  group_by(country) %>% 
  summarize(total_sys_crs=sum(systemic_crisis)) %>% 
  arrange(total_sys_crs) %>% 
  ungroup() %>% 
  slice_head(n=3) %>% 
  htmlTable()

```


These countries did not have systemic crises and also they had banking crises much less frequent than other countries, which makes me think that banking crises are related to systemic crises. To verify that, I will plot timelines for systemic crises and banking crises for each country.


```{r}
ac_imf %>% select(country, year, systemic_crisis, banking_crisis) %>% 
  pivot_longer(cols= ends_with("crisis"), 
               names_to= "Type_of_crisis", 
               values_to= "occurance") %>% 
  ggplot() +
  geom_jitter(aes(year, occurance, color= Type_of_crisis, shape=Type_of_crisis), width = 0.2, height = 0.2) + 
  geom_line(aes(year, occurance, color= Type_of_crisis)) +
  scale_y_continuous(name = "Occurance of Crisis", breaks = c(0,1)) +
  facet_wrap(vars(country), ncol = 2) +
  theme_minimal()

```

Plots for Central African Republic, Zimbabwe, Kenya, Algeria, Ivory Coast, Tunusia, Nigeria, Zambia and Morocca pose big evidence that there is very close relationship between banking crises and systemic crises. Indeed, it looks that whenever a country had systemic crisis it had banking crisis as well. The reverse of that not be true though. From the plots above, we see that Angola and South Africa had banking crises for few times but they did not experience systemic crises meanwhile. However, 

Besides that, the plots above show that banking and sytemic crises became more frequent after 1980s. I will now investigate it and also include currency and inflation crises to my analysis.

```{r warning=FALSE}

ac_imf %>% select(year, systemic_crisis, banking_crisis, inflation_crisis, currency_crisis) %>% 
  pivot_longer(cols= ends_with("crisis"), 
               names_to= "Type_of_crisis", 
               values_to= "occurance") %>% 
  group_by(year, Type_of_crisis) %>%
  summarize(Number_of_Occurance=sum(occurance), .groups= "drop") %>%
  ggplot(aes(year, Number_of_Occurance)) +
  geom_col(aes(fill=Type_of_crisis), color="white", width= 700, alpha=0.8)


```

So, the graph tells us that beginning with late 1980s more African countries started to experience crises more often. This trend lasted until 2000s. 1995 is the peak year in terms of overall number of economics crises. Also, systemic crises took place more often during this period.


### Domestic & Sovereign external debt in defaults

Now I would like to visualize domestic debt in default and sovereign external debt in default cases varied over 50 year in African countries.

```{r}
ac_imf %>% mutate(Year= as.numeric(format(year, format="%Y"))) %>% 
  select(Year, contains("debt")) %>% 
  pivot_longer(cols= contains("debt"), 
               names_to= "Debts_in_Deafult", 
               values_to= "occurance") %>% 
  filter(occurance==1) %>% ggplot(aes(x= Year, fill= Debts_in_Deafult)) + 
  geom_histogram(position= "dodge2", alpha = 0.8, bins=15) +
  geom_freqpoly(aes(Year, color=Debts_in_Deafult), size=1, binwidth = 3.5) +
  scale_x_continuous(breaks=c(1960,1970,1980,1990,2000,2010)) +
  theme_bw()

```
The graph shows that domestic debt in default cases have almost the same trend as economic crises we analyzed in previous part. They started to become more often after mid 80s but they lasted in 2000s as well. On the other hand, there is no specific trend for domestic debt in default cases. So, we can result that in years of systemic crises, debts in default are prevalent as well.


### Export & Import Price Index and Inflation Rate

Let me first visualize how export and import price indices change in African countries over 50 years.
```{r}
ac_imf %>%  dplyr::rename("Export_Price_Index"="exp_pi", "Import_Price_Index"="imp_pi") %>% 
  pivot_longer(cols= c("Export_Price_Index", "Import_Price_Index"), 
                        names_to= "Price_Index",
                        values_to= "Rate") %>% 
  ggplot(aes(x=year, y=Rate)) +
  geom_point(aes(color=Price_Index)) +
  geom_line(aes(color=Price_Index)) +
  guides(color= guide_legend(title = NULL, override.aes= list(size= 3))) +
  scale_y_continuous(name= "Export & Import Price Indices") +
  facet_wrap(vars(country), ncol = 2) +
  theme_minimal()

```

From the plots, we can see that Algeria, Angola, Nigeria, Zambia had significant drop, Ivory Coast and Tunusia had moderate drop in their export price indices during 80s and 90s, which might cause trade deficits in these countries and trigger other macroeconomic issues.

I also expect that when the gap between Import and Export price indices of a country gets larger, it is more probable that the inflation rate in that country goes up. I would like to test my hypothesis in these African countries. To get better a fit and visualization, I will put an upper limit on inflation rate.
```{r message=FALSE}

ac_imf %>% filter(inflation_rate <500) %>% mutate(Import_Export= imp_pi/exp_pi*100) %>% filter(Import_Export <200) %>% 
  ggplot(aes(x=Import_Export, y=inflation_rate)) +
  xlab("Import and Export Price Index Ratio") +
  ylab("Inflation Rate") +
  geom_point() +
  geom_smooth(se= FALSE) + 
  theme_bw()


```


### Correlation Between Economic Indicators

Lastly I would like to explore the correlation between the economic indicators of African countries I got in the dataset.


```{r}
ac_imf %>% select(systemic_crisis, 
                  banking_crisis, 
                  inflation_crisis, 
                  currency_crisis, 
                  inflation_rate, 
                  exch_usd, 
                  domestic_debt_in_default, 
                  sovereign_external_debt_default, 
                  exp_pi, 
                  imp_pi,
                  gdp_weighted_default) %>% 
  dplyr::rename("Systemic Crisis"="systemic_crisis",
                "Banking Crisis"="banking_crisis",
                "Inflation Crisis"="inflation_crisis",
                "Currency Crisis"="currency_crisis",
                "Inflation Rate"="inflation_rate",
                "Exchange Rate"="exch_usd",
                "Domestic Debt"="domestic_debt_in_default",
                "External Debt"="sovereign_external_debt_default",
                "GDP Weighted Default"="gdp_weighted_default",
                "Export Price Index"="exp_pi",
                "Import Price Index"="imp_pi"
                ) %>% 
  cor() %>% 
  corrplot(method="circle", tl.srt= 60)

```
Thus, the biggest positive correlation is between the banking crises and systemic crises , which is I already pointed in my previous analysis. There is significant positive relationship between inflation rate and domestic debt in default.
A new exploration I got form the correlation plot is that there is significant negative correlation between export price index and inflation crisis. It makes sense because these African countries depend on their exports in terms of revenue. If export price index gets lower, they face trade deficits and, as a result of that, their economies get into trouble.


# Conclusion and Considerations for Further Analysis

Economic crises in 13 African countries covered in the dataset basically started after 1960s. Additionally, there is not much data collected before 1960s regarding these countries. Since it would not be reliable to interpret the scarce economic indicators older than 50-60 years old, I focused on the data collected after 1960s in my project.
The core finding of the project is the strong relationsihp between the systemic crisis and banking crisis. If a country experienced a banking crisis, then their economy got usually through a systemic crisis as well. Most often, they happened at the same time. 
Besides that, these African countries were exposed to economic crises more often in the period between mid-1980s and 2000s. A possible explanation for that is as following: decline in global commodity prices such as Oil and petroleum products and other agricultural commodities in 1980s impacted the trade/external accounts of these countries because they are heavily depended on import revenues of natural resources. Due to limited diversification of their exports and having low capital inflows, fluctuations in global commodity prices had strong negative impact on their domestic economy.

My considerations for further studies would be:

- More variables should be included such as interest rates, trade deficit, government deficit, growth rate etc.
- More countries should be included to enable to make more reliable interpretations.


## References:

 - Kaggle dataset: https://www.kaggle.com/datasets/chirin/africa-economic-banking-and-systemic-crisis-data
 - Commodity of Terms of Trade: Commodity Import/Export Price Index, Individual Commodities Weighted by Ratio of Imports to GDP Historical, Annual (1965-present), Fixed Weights, Index (2012=100): https://data.imf.org/?sk=2CDDCCB8-0B59-43E9-B6A0-59210D5605D2&sId=1434492942851
 - R for Data Science: https://r4ds.had.co.nz/index.html
 - https://r-graph-gallery.com/index.html
 - ggplot2: Elegant Graphics for Data Analysis: https://ggplot2-book.org/ 
 - DACSS 601 Class resources: https://github.com/DACSS/601_Fall_2022