final
This project provides a high-level overview of the data set I will be using for my final project. The data set, titled ‘Income and Democracy,’ was published by the American Economic Consortium in 2008.
Author

Caitlin Rowley

Published

May 25, 2023

Background and Research Question:

My research question will focus on the cross-country correlation between income and democracy. A 2008 study titled “Income and Democracy,” published in the American Economic Review, argues that existing studies that establish a strong cross-country correlation between income and democracy do not control for factors that simultaneously affect both variables. Accordingly, this study, which focuses on changes in democracy over the past 500 years, controls for certain country-fixed effects—such as date of independence, constraints on the executive, and religious affiliation—thereby voiding any causal relationship between income per capita and various measures of democracy. This study is the source for my data set.

The authors of “Income and Democracy” included a wide range of variables in their study; specifically, they tested the significance of constraint on the executive; year of independence; settler mortality; population density; Catholic population; Muslim population; Protestant population; shift in per capita income; and shift in democracy. As noted, the authors asserted that after controlling for all the aforementioned country-fixed effects, there is no evidence of a causal relationship between income and democracy over time. Interestingly, the study also indicated that education was determined to be statistically insignificant as an independent, country-fixed effect on the correlation between income and democracy, and it was thus excluded from the study.

In contrast to “Income and Democracy,” a 1999 study titled “Determinants of Democracy,” authored by Robert J. Barro, indicates that additional improvements to the standard of living do, in fact, predict an increase in democracy. Barro suggests not only that education is an indicator of the correlation between income democracy, but that the pervasiveness of select religious affiliations is, as well. Specifically, Barro claimed that the negative effects from Muslim and non‐religious affiliations remain statistically significant in this association regardless of external control factors.

This incongruity led me to wonder whether we would see a causal relationship between income and democracy if the economic variables used in the two studies were more aligned. Specifically, I would like to examine how this change would affect both studies’ claims on the role of religious affiliation; in other words, will adding explanatory variables related to religious affiliation—specifically, non-religious affiliation—cause a shift in the lack of correlation between income and democracy? Additionally, given “Income and Democracy’s” declaration on the statistical insignificance of education as a country-fixed effect, I am also interested in seeing whether adding an interaction term—specifically, government expenditure on education—will have an effect on this relationship.

Research question: How will adding non-religious affiliation as an additional control variable and education as an interaction term impact the lack of causal relationship between income and democracy?

Hypothesis:

After reviewing “Income and Democracy,” it does not appear that non-religious affiliation was integrated into the report. Further, as the authors noted its lack of statistical significance related to its causal effect on the relationship between income democracy, neither is education. However, I am curious about how the inclusion of these two variables would affect Barro’s conclusion relative to the correlation between religious affiliation and democracy. As such, by adding these variables, I will be revisiting a previously-tested hypothesis.

I hypothesize that adding non-religious affiliation and education as additional control variables, and education*shift in per capita income as an interaction term will lead to increased statistical significance between income and democracy.

Descriptive Statistics:

Data for this study was collected from the Freedom House Political Rights Index, the Polity Composite Democracy Index, and data from other studies conducted by Barro and Kenneth A. Bollen.

Variables I will be focusing on include:

  • Country;
  • Constraint on the executive;
  • Year of independence;
  • Settler mortality;
  • Population density;
  • Catholic population;
  • Muslim population;
  • Protestant population;
  • Religiously-unaffiliated population;
  • Education funding;
  • Shift in per capita income; and
  • Shift in democracy.
Code
# load libraries:

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.2.2
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.4      ✔ forcats 0.5.2 
Warning: package 'readr' was built under R version 4.2.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(readxl)
Warning: package 'readxl' was built under R version 4.2.2
Code
library(dplyr)
Code
# read in data file:

data_file <- read_excel("C:/Users/caitr/OneDrive/Documents/DACSS/DACSS 603/603_Spring_2023/posts/_data/Income-Democracy.xls", sheet = "500 Year Panel")
head(data_file)
# A tibble: 6 × 15
  code  country     consf…¹ indcent indyear logem4 lpd15…² madid rel_c…³ rel_m…⁴
  <chr> <chr>         <dbl>   <dbl>   <dbl>  <dbl>   <dbl> <dbl>   <dbl>   <dbl>
1 ADO   Andorra      NA        18      1800  NA     NA      1001  NA      NA    
2 AFG   Afghanistan   0        19.2    1919   4.54   2.12   3002   0       0.993
3 AGO   Angola        0.333    19.8    1975   5.63   0.405  2011   0.687   0    
4 ALB   Albania       0.667    19.1    1912  NA      1.99   2009  NA      NA    
5 ARE   United Ara…   0.333    19.7    1971  NA      0      3002   0.004   0.949
6 ARG   Argentina     0        18.2    1816   4.23  -2.21   5001   0.916   0.002
# … with 5 more variables: rel_protmg80 <dbl>, growth <dbl>, democ <dbl>,
#   world <dbl>, colony <dbl>, and abbreviated variable names ¹​consfirstaug,
#   ²​lpd1500s, ³​rel_catho80, ⁴​rel_muslim80
Code
# remove dummy/unnecessary variables (as identified in study's variable key):

data_cln = subset(data_file, select = -c(code, world, colony, indcent, madid))
head(data_cln)
# A tibble: 6 × 10
  country   consf…¹ indyear logem4 lpd15…² rel_c…³ rel_m…⁴ rel_p…⁵  growth democ
  <chr>       <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra    NA        1800  NA     NA      NA      NA      NA      3.46   NA   
2 Afghanis…   0        1919   4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola      0.333    1975   5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania     0.667    1912  NA      1.99   NA      NA      NA      1.68    0.75
5 United A…   0.333    1971  NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argentina   0        1816   4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with abbreviated variable names ¹​consfirstaug, ²​lpd1500s, ³​rel_catho80,
#   ⁴​rel_muslim80, ⁵​rel_protmg80
Code
# rename variables

data_cln2 <- data_cln %>% rename("exec_constraint"="consfirstaug", "year_ind"="indyear", "log_settler_mort"="logem4", "log_pop_dens"="lpd1500s")
head(data_cln2)
# A tibble: 6 × 10
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80
Code
# remove duplicates:

duplicates <- duplicated(data_cln2)
duplicates["TRUE"]
[1] NA
Code
# remove blank observations (observations with some NAs are not removed):

data_blank <- data_cln2[rowSums(is.na(data_cln2)) != ncol(data_cln2), ]
head(data_blank)
# A tibble: 6 × 10
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80
Code
# remove some NAs for description but not analysis:

data_NA <- data_blank[rowSums(is.na(data_cln2)) == 0, ]
dim(data_NA)
[1] 76 10
Code
# confirm data frame size of clean data set:

dim(data_cln2)
[1] 173  10

We can see that this data set has 10 variables and 173 observations (though there will be 12 variables once I collect and add data related to education and non-religious affiliation). There are no duplicate observations, nor are there any blank observations. However, in the case that we remove observations with any missing values, the data set would only have 76 observations. Nonetheless, because the study’s authors elected to utilize incomplete observations, I will do the same.

Code
# summary of data (remove categorical variables):

library(summarytools)
Warning: package 'summarytools' was built under R version 4.2.2

Attaching package: 'summarytools'
The following object is masked from 'package:tibble':

    view
Code
summary <- subset(data_cln2, select = -c(country))
summary(summary)
 exec_constraint     year_ind    log_settler_mort  log_pop_dens    
 Min.   :0.0000   Min.   :1800   Min.   :0.9361   Min.   :-3.8309  
 1st Qu.:0.0000   1st Qu.:1830   1st Qu.:4.2327   1st Qu.: 0.1911  
 Median :0.3333   Median :1947   Median :4.5099   Median : 1.0986  
 Mean   :0.3942   Mean   :1912   Mean   :4.6269   Mean   : 1.1317  
 3rd Qu.:0.6667   3rd Qu.:1964   3rd Qu.:5.5978   3rd Qu.: 2.4407  
 Max.   :1.0000   Max.   :1984   Max.   :7.9862   Max.   : 5.6426  
 NA's   :23                      NA's   :85       NA's   :22       
  rel_catho80       rel_muslim80       rel_protmg80        growth       
 Min.   :0.00000   Min.   :0.000000   Min.   :0.0000   Min.   :-0.6421  
 1st Qu.:0.00875   1st Qu.:0.000075   1st Qu.:0.0020   1st Qu.: 1.1432  
 Median :0.12600   Median :0.014500   Median :0.0230   Median : 1.8700  
 Mean   :0.31126   Mean   :0.243986   Mean   :0.1253   Mean   : 1.9514  
 3rd Qu.:0.56575   3rd Qu.:0.385750   3rd Qu.:0.1805   3rd Qu.: 2.7023  
 Max.   :0.97300   Max.   :0.998000   Max.   :0.9780   Max.   : 4.2531  
 NA's   :21        NA's   :21         NA's   :22       NA's   :1        
     democ       
 Min.   :0.0000  
 1st Qu.:0.3500  
 Median :0.8000  
 Mean   :0.6556  
 3rd Qu.:0.9250  
 Max.   :1.0000  
 NA's   :38      

We can see here a data frame containing summary statistics for the 9 variables with numeric data, as the categorical variable simply indicates the country name. These statistics will be more meaningful upon following the addition of data related to education and non-religious affiliation.

Data Merge:

Code
# import new religious affiliation variables

relig_1 <- read_csv("C:/Users/caitr/OneDrive/Documents/DACSS/DACSS 603/603_Spring_2023/posts/_data/DACSS603_religious_aff.csv")
Rows: 1205 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Region, Country
dbl (9): Year, Buddhists, Christians, Folk Religions, Hindus, Jews, Muslims,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
head(relig_1)
# A tibble: 6 × 11
   Year Region      Country Buddh…¹ Chris…² Folk …³ Hindus  Jews Muslims Other…⁴
  <dbl> <chr>       <chr>     <dbl>   <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>
1  2010 World       All Co…     7.1    31.4     5.9     15   1      23.2       1
2  2010 North Amer… All Co…     1.1    77.4     1        1   1.8     1         1
3  2010 Latin Amer… All Co…     1      90       1.7      1   1       1         1
4  2010 Europe      All Co…     1      74.5     1        1   1       5.9       1
5  2010 Middle Eas… All Co…     1       3.7     1        1   1.6    93         1
6  2010 Sub-Sahara… All Co…     1      62.9     3.3      1   1      30.2       1
# … with 1 more variable: Unaffiliated <dbl>, and abbreviated variable names
#   ¹​Buddhists, ²​Christians, ³​`Folk Religions`, ⁴​`Other Religions`
Code
# convert percentages to decimals

relig_dec <- transform(relig_1, rel_unaffiliated = (Unaffiliated/100), Jewish = (Jews/100), Hindu = (Hindus/100), Buddhist = (Buddhists/100))
head(relig_dec)
  Year                   Region       Country Buddhists Christians
1 2010                    World All Countries       7.1       31.4
2 2010            North America All Countries       1.1       77.4
3 2010  Latin America-Caribbean All Countries       1.0       90.0
4 2010                   Europe All Countries       1.0       74.5
5 2010 Middle East-North Africa All Countries       1.0        3.7
6 2010       Sub-Saharan Africa All Countries       1.0       62.9
  Folk.Religions Hindus Jews Muslims Other.Religions Unaffiliated
1            5.9     15  1.0    23.2               1         16.4
2            1.0      1  1.8     1.0               1         17.1
3            1.7      1  1.0     1.0               1          7.7
4            1.0      1  1.0     5.9               1         18.8
5            1.0      1  1.6    93.0               1          1.0
6            3.3      1  1.0    30.2               1          3.2
  rel_unaffiliated Jewish Hindu Buddhist
1            0.164  0.010  0.15    0.071
2            0.171  0.018  0.01    0.011
3            0.077  0.010  0.01    0.010
4            0.188  0.010  0.01    0.010
5            0.010  0.016  0.01    0.010
6            0.032  0.010  0.01    0.010
Code
# remove "all countries" from 'country' variable

relig_country <- relig_dec[!(relig_dec$Country=="All Countries"),]
head(relig_country)
   Year                   Region        Country Buddhists Christians
8  2010             Asia-Pacific    Afghanistan         1        1.0
9  2010                   Europe        Albania         1       18.0
10 2010 Middle East-North Africa        Algeria         1        1.0
11 2010             Asia-Pacific American Samoa         1       98.3
12 2010                   Europe        Andorra         1       89.5
13 2010       Sub-Saharan Africa         Angola         1       90.5
   Folk.Religions Hindus Jews Muslims Other.Religions Unaffiliated
8             1.0      1    1    99.0               1          1.0
9             1.0      1    1    80.3               1          1.4
10            1.0      1    1    97.9               1          1.8
11            1.0      1    1     1.0               1          1.0
12            1.0      1    1     1.0               1          8.8
13            4.2      1    1     1.0               1          5.1
   rel_unaffiliated Jewish Hindu Buddhist
8             0.010   0.01  0.01     0.01
9             0.014   0.01  0.01     0.01
10            0.018   0.01  0.01     0.01
11            0.010   0.01  0.01     0.01
12            0.088   0.01  0.01     0.01
13            0.051   0.01  0.01     0.01
Code
# remove unneeded columns

relig_vars <- subset(relig_country, select = -c(Region, Buddhists, Christians, Folk.Religions, Hindus, Hindu, Jews, Jewish, Buddhist, Muslims, Other.Religions, Unaffiliated))
head(relig_vars)
   Year        Country rel_unaffiliated
8  2010    Afghanistan            0.010
9  2010        Albania            0.014
10 2010        Algeria            0.018
11 2010 American Samoa            0.010
12 2010        Andorra            0.088
13 2010         Angola            0.051
Code
# use data from 2010 (comparable to base data set)

relig_2010 <- relig_vars[!(relig_vars$Year=="2020" | relig_vars$Year=="2030" | relig_vars$Year=="2040" | relig_vars$Year=="2050"),]
head(relig_2010)
   Year        Country rel_unaffiliated
8  2010    Afghanistan            0.010
9  2010        Albania            0.014
10 2010        Algeria            0.018
11 2010 American Samoa            0.010
12 2010        Andorra            0.088
13 2010         Angola            0.051
Code
# remove duplicates

rel_duplicates <- duplicated(relig_2010)
rel_duplicates["TRUE"]
[1] NA
Code
# remove "year" column

relig_cln <- subset(relig_2010, select = -c(Year))
head(relig_cln)
          Country rel_unaffiliated
8     Afghanistan            0.010
9         Albania            0.014
10        Algeria            0.018
11 American Samoa            0.010
12        Andorra            0.088
13         Angola            0.051
Code
# merge with data_file

data_combined <- left_join(data_cln2, relig_cln,
                   by = c("country" = "Country"))
head(data_combined)
# A tibble: 6 × 11
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 1 more variable: rel_unaffiliated <dbl>, and abbreviated variable
#   names ¹​exec_constraint, ²​year_ind, ³​log_settler_mort, ⁴​log_pop_dens,
#   ⁵​rel_catho80, ⁶​rel_muslim80, ⁷​rel_protmg80
Code
# merge additional control: education expenditure (GDP)

edu <- read_csv("C:/Users/caitr/OneDrive/Documents/DACSS/DACSS 603/603_Spring_2023/posts/_data/DACSS603_edu_expend.csv")
Rows: 266 Columns: 66
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): Country Name, Country Code, Indicator Name, Indicator Code
dbl (52): 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, ...
lgl (10): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
head(edu)
# A tibble: 6 × 66
  Country Na…¹ Count…² Indic…³ Indic…⁴ `1960` `1961` `1962` `1963` `1964` `1965`
  <chr>        <chr>   <chr>   <chr>   <lgl>  <lgl>  <lgl>  <lgl>  <lgl>  <lgl> 
1 Aruba        ABW     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
2 Africa East… AFE     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
3 Afghanistan  AFG     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
4 Africa West… AFW     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
5 Angola       AGO     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
6 Albania      ALB     Govern… SE.XPD… NA     NA     NA     NA     NA     NA    
# … with 56 more variables: `1966` <lgl>, `1967` <lgl>, `1968` <lgl>,
#   `1969` <lgl>, `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>,
#   `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>,
#   `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>,
#   `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>,
#   `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>,
#   `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, …
Code
# remove unneeded columns

edu_vars <- subset(edu, select = c("Country Name", "2010"))
head(edu_vars)
# A tibble: 6 × 2
  `Country Name`              `2010`
  <chr>                        <dbl>
1 Aruba                         6.93
2 Africa Eastern and Southern   4.50
3 Afghanistan                   3.48
4 Africa Western and Central    2.96
5 Angola                        3.42
6 Albania                       3.41
Code
# remove duplicates

edu_duplicates <- duplicated(edu_vars)
edu_duplicates["TRUE"]
[1] NA
Code
# merge with combined data

data_merge <- left_join(data_combined, edu_vars,
                   by = c("country" = "Country Name"))
data_final <- rename(data_merge, "education"="2010")
head(data_final)
# A tibble: 6 × 12
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 2 more variables: rel_unaffiliated <dbl>, education <dbl>, and
#   abbreviated variable names ¹​exec_constraint, ²​year_ind, ³​log_settler_mort,
#   ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80, ⁷​rel_protmg80
Code
# summary of final data frame

summary(data_final)
   country          exec_constraint     year_ind    log_settler_mort
 Length:173         Min.   :0.0000   Min.   :1800   Min.   :0.9361  
 Class :character   1st Qu.:0.0000   1st Qu.:1830   1st Qu.:4.2327  
 Mode  :character   Median :0.3333   Median :1947   Median :4.5099  
                    Mean   :0.3942   Mean   :1912   Mean   :4.6269  
                    3rd Qu.:0.6667   3rd Qu.:1964   3rd Qu.:5.5978  
                    Max.   :1.0000   Max.   :1984   Max.   :7.9862  
                    NA's   :23                      NA's   :85      
  log_pop_dens      rel_catho80       rel_muslim80       rel_protmg80   
 Min.   :-3.8309   Min.   :0.00000   Min.   :0.000000   Min.   :0.0000  
 1st Qu.: 0.1911   1st Qu.:0.00875   1st Qu.:0.000075   1st Qu.:0.0020  
 Median : 1.0986   Median :0.12600   Median :0.014500   Median :0.0230  
 Mean   : 1.1317   Mean   :0.31126   Mean   :0.243986   Mean   :0.1253  
 3rd Qu.: 2.4407   3rd Qu.:0.56575   3rd Qu.:0.385750   3rd Qu.:0.1805  
 Max.   : 5.6426   Max.   :0.97300   Max.   :0.998000   Max.   :0.9780  
 NA's   :22        NA's   :21        NA's   :21         NA's   :22      
     growth            democ        rel_unaffiliated    education      
 Min.   :-0.6421   Min.   :0.0000   Min.   :0.01000   Min.   : 0.4664  
 1st Qu.: 1.1432   1st Qu.:0.3500   1st Qu.:0.01000   1st Qu.: 3.1675  
 Median : 1.8700   Median :0.8000   Median :0.03300   Median : 4.4872  
 Mean   : 1.9514   Mean   :0.6556   Mean   :0.08615   Mean   : 4.4160  
 3rd Qu.: 2.7023   3rd Qu.:0.9250   3rd Qu.:0.10925   3rd Qu.: 5.5250  
 Max.   : 4.2531   Max.   :1.0000   Max.   :0.59600   Max.   :12.8373  
 NA's   :1         NA's   :38       NA's   :25        NA's   :53       

Visualization:

Code
# add religion as new categorical variable based on highest percentage of each religion:

newdata <- select(data_final, country, rel_catho80, rel_muslim80, rel_protmg80, rel_unaffiliated)
head(newdata)
# A tibble: 6 × 5
  country              rel_catho80 rel_muslim80 rel_protmg80 rel_unaffiliated
  <chr>                      <dbl>        <dbl>        <dbl>            <dbl>
1 Andorra                   NA           NA           NA                0.088
2 Afghanistan                0            0.993        0                0.01 
3 Angola                     0.687        0            0.198            0.051
4 Albania                   NA           NA           NA                0.014
5 United Arab Emirates       0.004        0.949        0.003            0.011
6 Argentina                  0.916        0.002        0.027            0.122
Code
library(reshape2)
Warning: package 'reshape2' was built under R version 4.2.2

Attaching package: 'reshape2'
The following object is masked from 'package:tidyr':

    smiths
Code
data_final$religion <- apply(data_final[, c("rel_protmg80", "rel_muslim80", "rel_catho80", "rel_unaffiliated")], 1, function(row) {
  religions <- c("rel_protmg80", "rel_muslim80", "rel_catho80", "rel_unaffiliated")
  max_percentage <- max(row)
  religion <- religions[row == max_percentage]
  paste(religion, collapse = ", ")
})
head(data_final)
# A tibble: 6 × 13
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 3 more variables: rel_unaffiliated <dbl>, education <dbl>,
#   religion <chr>, and abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80
Code
# pivot data to add variable with all religions

data_pivot <- data_final %>% 
  pivot_longer(cols=c('rel_catho80', 'rel_muslim80', 'rel_protmg80', 'rel_unaffiliated'),
                    names_to='relig_affiliation',
                    values_to='relig_pop')
head(data_pivot)
# A tibble: 6 × 11
  country  exec_…¹ year_…² log_s…³ log_p…⁴  growth democ educa…⁵ relig…⁶ relig…⁷
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl> <chr>   <chr>  
1 Andorra       NA    1800   NA      NA     3.46   NA       2.98 NA, NA… rel_ca…
2 Andorra       NA    1800   NA      NA     3.46   NA       2.98 NA, NA… rel_mu…
3 Andorra       NA    1800   NA      NA     3.46   NA       2.98 NA, NA… rel_pr…
4 Andorra       NA    1800   NA      NA     3.46   NA       2.98 NA, NA… rel_un…
5 Afghani…       0    1919    4.54    2.12 -0.0849  0.15    3.48 rel_mu… rel_ca…
6 Afghani…       0    1919    4.54    2.12 -0.0849  0.15    3.48 rel_mu… rel_mu…
# … with 1 more variable: relig_pop <dbl>, and abbreviated variable names
#   ¹​exec_constraint, ²​year_ind, ³​log_settler_mort, ⁴​log_pop_dens, ⁵​education,
#   ⁶​religion, ⁷​relig_affiliation
Code
# remove NAs

data_NA <- data_pivot%>%
  na.omit(data_pivot)%>%
  select(c(growth, democ, education, relig_affiliation, relig_pop))
head(data_NA)
# A tibble: 6 × 5
   growth democ education relig_affiliation relig_pop
    <dbl> <dbl>     <dbl> <chr>                 <dbl>
1 -0.0849  0.15      3.48 rel_catho80           0    
2 -0.0849  0.15      3.48 rel_muslim80          0.993
3 -0.0849  0.15      3.48 rel_protmg80          0    
4 -0.0849  0.15      3.48 rel_unaffiliated      0.01 
5  0.644   0.35      3.42 rel_catho80           0.687
6  0.644   0.35      3.42 rel_muslim80          0    
Code
# remove observations where values are equal

data_final_vis <- subset(data_final, religion!="rel_protmg80, rel_catho80")
head(data_final_vis)
# A tibble: 6 × 13
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 3 more variables: rel_unaffiliated <dbl>, education <dbl>,
#   religion <chr>, and abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80
Code
# scatterplot for income x democray by religion

relig_vis3 <- data_final_vis%>%
  na.omit(data_final_vis)%>%
  ggplot(aes(democ, growth))+
  facet_wrap("religion")+
  labs(x="Shift in Democracy",y="Shift in Per Capita Income", title = "Shift in Per Capita Income and Democracy by Religious Affiliation") + geom_point(mapping = aes(color = religion))
print(relig_vis3)

Code
# scaterplot by median pop percentages by religion

dem_inc_relig <- data_final_vis%>%
  na.omit(data_final_vis)%>%
  ggplot(aes(democ, growth))+
  geom_point(aes(color=religion, shape=religion))+  
  labs(x="Shift in Democracy", y="Shift in Income Per Capita", title = "Shift in Per Capita Income and Democracy by Religious Affiliation")
print(dem_inc_relig)

Code
# stacked bar chart

inc_relig <- data_final_vis%>%
  na.omit(data_final_vis)%>%
ggplot(aes(x=democ, y=education, fill=religion)) +
geom_bar(stat="identity") +
scale_fill_hue() +
  theme_classic() +
  labs(x="Shift in Democracy", y="Education Funding", title="Shift in Education Funding and Democracy by Religious Affiliation")
print(inc_relig)

Code
# line chart

line_chart <- data_final_vis%>%
  na.omit(data_final_vis)%>%
  ggplot(aes(x=democ, y=education, colour=religion, group=religion)) + labs(x="Shift in Democracy", y="Education Funding", title="Shift in Education Funding and Democracy by Religious Affiliation") +
  geom_line()
print(line_chart)

Code
# scatterplot 

relig_income2 <- data_final_vis%>%
  na.omit(data_final_vis)%>%
  ggplot(aes(education, growth))+
  facet_wrap("religion")+
  labs(x="Education Funding",y="Shift in Income Per Capita", title = "Shift in Per Capita Income and Education Funding by Religious Affiliation") + geom_point(mapping = aes(color = religion))
print(relig_income2)

Analysis:

Hypothesis Testing:

Define variables:

  1. Response (dependent) variable: Shift in democracy over time (“democ”).
  2. Explanatory (independent) variable: Shift in per capita income over time (“growth”).
  3. Control variables: country (“country”); constraint on the executive at the time of independence (“exec_constraint”); year of independence (“year_ind”); settler mortality rate (“log_settler_mort”); population density (“log_pop_dens”); Catholic population (“rel_catho80”); Muslim population (“rel_muslim80”); Protestant population (“rel_protmg80”); religiously-unaffiliated population (“rel_unaffiliated”); and education funding (“education”).
  4. Interaction terms: Education funding (“education”)*shift in per capita income (“growth”).

I specifically added religiously-unaffiliated population (by percentage of population by country) and education funding (by expenditure as a percentage of GDP by country) as additional control variables because of the role they played in Barro’s 1999 study. Although Barro did not control for the same country-specific effects as the authors of “Income and Democracy,” he nonetheless suggested that both education and religious affiliation are statistically significant indicators of the correlation between income and democracy. So, I elected to add religiously-unaffiliated population to my replication data set because it was included in Barro’s study but excluded from “Income and Democracy.” Additionally, even though education was deemed to be statistically insignificant in “Income and Democracy,” I thought it would be interesting to see if adding religiously-unaffiliated population would affect the role of education a country-fixed effect.

Moreover, as I initially speculated that education would be statistically significant prior to reviewing “Income and Democracy,” I was curious to see if adding education funding*shift as an interaction term would perhaps provide additional insight on the reach of education as a country-fixed effect.

Transform Variables

Code
# transform education from continuous to ordinal (quartiles based on range)

range_edu <- range(data_final$education, na.rm=TRUE)

# ID quantiles

quantile_edu <- quantile(range_edu)
quantile_edu
        0%        25%        50%        75%       100% 
 0.4663837  3.5591152  6.6518468  9.7445783 12.8373098 
Code
library(gtools)
Warning: package 'gtools' was built under R version 4.2.3
Code
# add column for quantiles

edu_quant <- quantcut(data_final$education, q = 4, na.rm = TRUE)
data_final_edu <- mutate(data_final, edu_quant=edu_quant)
head(data_final_edu)
# A tibble: 6 × 14
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 4 more variables: rel_unaffiliated <dbl>, education <dbl>,
#   religion <chr>, edu_quant <fct>, and abbreviated variable names
#   ¹​exec_constraint, ²​year_ind, ³​log_settler_mort, ⁴​log_pop_dens,
#   ⁵​rel_catho80, ⁶​rel_muslim80, ⁷​rel_protmg80
Code
newdata <- select(data_final, country, rel_catho80, rel_muslim80, rel_protmg80, rel_unaffiliated)
head(newdata)
# A tibble: 6 × 5
  country              rel_catho80 rel_muslim80 rel_protmg80 rel_unaffiliated
  <chr>                      <dbl>        <dbl>        <dbl>            <dbl>
1 Andorra                   NA           NA           NA                0.088
2 Afghanistan                0            0.993        0                0.01 
3 Angola                     0.687        0            0.198            0.051
4 Albania                   NA           NA           NA                0.014
5 United Arab Emirates       0.004        0.949        0.003            0.011
6 Argentina                  0.916        0.002        0.027            0.122
Code
library(reshape2)

data_final$religion <- apply(data_final[, c("rel_protmg80", "rel_muslim80", "rel_catho80", "rel_unaffiliated")], 1, function(row) {
  religions <- c("rel_protmg80", "rel_muslim80", "rel_catho80", "rel_unaffiliated")
  max_percentage <- max(row)
  religion <- religions[row == max_percentage]
  paste(religion, collapse = ", ")
})
head(data_final)
# A tibble: 6 × 13
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra   NA        1800   NA     NA      NA      NA      NA      3.46   NA   
2 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania    0.667    1912   NA      1.99   NA      NA      NA      1.68    0.75
5 United …   0.333    1971   NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with 3 more variables: rel_unaffiliated <dbl>, education <dbl>,
#   religion <chr>, and abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80

Correlation Tests

Code
# remove observations with all NAs

sans_NA <- na.omit(data_final)
head(sans_NA)
# A tibble: 6 × 13
  country  exec_…¹ year_…² log_s…³ log_p…⁴ rel_c…⁵ rel_m…⁶ rel_p…⁷  growth democ
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Afghani…   0        1919    4.54   2.12    0       0.993   0     -0.0849  0.15
2 Angola     0.333    1975    5.63   0.405   0.687   0       0.198  0.644   0.35
3 Argenti…   0        1816    4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
4 Austral…   1        1901    2.15  -3.65    0.296   0.002   0.235  3.99    1   
5 Burundi    0.148    1962    5.63   3.22    0.783   0.009   0.049  0.328   0.45
6 Benin      0.125    1960    5.59   1.44    0.185   0.152   0.028  1.16    0.8 
# … with 3 more variables: rel_unaffiliated <dbl>, education <dbl>,
#   religion <chr>, and abbreviated variable names ¹​exec_constraint, ²​year_ind,
#   ³​log_settler_mort, ⁴​log_pop_dens, ⁵​rel_catho80, ⁶​rel_muslim80,
#   ⁷​rel_protmg80
Code
# correlation between income growth and education

cor(sans_NA[c("growth", "education")])
             growth education
growth    1.0000000 0.4441714
education 0.4441714 1.0000000

There is a positive but low linear correlation (0.4) between shift in per capita income and education funding.

Code
# correlation between democracy and religious unaffiliation

cor(sans_NA[c("democ", "rel_unaffiliated")])
                      democ rel_unaffiliated
democ            1.00000000       0.08577673
rel_unaffiliated 0.08577673       1.00000000

There is a positive but very low linear correlation (0.1) between shift in democracy and religiously-unaffiliated population.

Code
# correlation between democracy and education

cor(sans_NA[c("democ", "education")])
              democ education
democ     1.0000000 0.2991782
education 0.2991782 1.0000000

There is a positive but low linear correlation (0.3) between shift in democracy and education funding.

Multiple regression models:

First, I will check to see if either of my added control variables need to be logged.

Education

Code
# education

hist(data_final$education)

This distribution is moderately right-skewed, so I will log the education variable.

Code
# log() education

log_education <- log(data_final$education)
hist(log_education)

Next, I will add this logged variable to my final data set to incorporate into my regression model.

Code
# transform new log(eduation) variable

data_final <- data_final %>%
  transform(log_education=log(education))
head(data_final)
               country exec_constraint year_ind log_settler_mort log_pop_dens
1              Andorra              NA     1800               NA           NA
2          Afghanistan       0.0000000     1919         4.540098    2.1202640
3               Angola       0.3333333     1975         5.634789    0.4054651
4              Albania       0.6666667     1912               NA    1.9877740
5 United Arab Emirates       0.3333333     1971               NA    0.0000000
6            Argentina       0.0000000     1816         4.232656   -2.2107220
  rel_catho80 rel_muslim80 rel_protmg80     growth democ rel_unaffiliated
1          NA           NA           NA  3.4558480    NA            0.088
2       0.000        0.993        0.000 -0.0848947  0.15            0.010
3       0.687        0.000        0.198  0.6444693  0.35            0.051
4          NA           NA           NA  1.6759930  0.75            0.014
5       0.004        0.949        0.003  3.3726190  0.10            0.011
6       0.916        0.002        0.027  2.7137030  0.90            0.122
  education       religion log_education
1  2.976630 NA, NA, NA, NA      1.090792
2  3.479450   rel_muslim80      1.246874
3  3.421320    rel_catho80      1.230026
4  3.413075 NA, NA, NA, NA      1.227614
5        NA   rel_muslim80            NA
6  5.019710    rel_catho80      1.613372

Religiously-unaffiliated population

Code
# religiously-unaffiliated population

hist(data_final$rel_unaffiliated)

This distribution is very right-skewed, so I will log the religiously-unaffiliated population variable.

Code
# log() rel_unaffiliated

log_rel_unaff <- log(data_final$rel_unaffiliated)
hist(log_rel_unaff)

Next, I will add this logged variable to my final data set to incorporate into my regression model.

Code
# transform new log(rel_unaffiliated) variable

data_final <- data_final %>%
  transform(log_rel_unaff=log(rel_unaffiliated))
head(data_final)
               country exec_constraint year_ind log_settler_mort log_pop_dens
1              Andorra              NA     1800               NA           NA
2          Afghanistan       0.0000000     1919         4.540098    2.1202640
3               Angola       0.3333333     1975         5.634789    0.4054651
4              Albania       0.6666667     1912               NA    1.9877740
5 United Arab Emirates       0.3333333     1971               NA    0.0000000
6            Argentina       0.0000000     1816         4.232656   -2.2107220
  rel_catho80 rel_muslim80 rel_protmg80     growth democ rel_unaffiliated
1          NA           NA           NA  3.4558480    NA            0.088
2       0.000        0.993        0.000 -0.0848947  0.15            0.010
3       0.687        0.000        0.198  0.6444693  0.35            0.051
4          NA           NA           NA  1.6759930  0.75            0.014
5       0.004        0.949        0.003  3.3726190  0.10            0.011
6       0.916        0.002        0.027  2.7137030  0.90            0.122
  education       religion log_education log_rel_unaff
1  2.976630 NA, NA, NA, NA      1.090792     -2.430418
2  3.479450   rel_muslim80      1.246874     -4.605170
3  3.421320    rel_catho80      1.230026     -2.975930
4  3.413075 NA, NA, NA, NA      1.227614     -4.268698
5        NA   rel_muslim80            NA     -4.509860
6  5.019710    rel_catho80      1.613372     -2.103734

Original model:

Code
# original study multiple regression model

reg_1 <- lm(democ ~ growth + exec_constraint + year_ind + log_settler_mort + log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80, data = data_final)
summary(reg_1)

Call:
lm(formula = democ ~ growth + exec_constraint + year_ind + log_settler_mort + 
    log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80, 
    data = data_final)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5123 -0.2113  0.0192  0.1510  0.5190 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)  
(Intercept)       2.259348   1.255158   1.800   0.0764 .
growth            0.108020   0.041738   2.588   0.0118 *
exec_constraint   0.132189   0.101750   1.299   0.1983  
year_ind         -0.001112   0.000660  -1.686   0.0965 .
log_settler_mort  0.041026   0.033013   1.243   0.2183  
log_pop_dens     -0.015336   0.020631  -0.743   0.4599  
rel_catho80       0.166250   0.124138   1.339   0.1850  
rel_muslim80     -0.001560   0.120866  -0.013   0.9897  
rel_protmg80      0.375138   0.257481   1.457   0.1498  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2381 on 67 degrees of freedom
  (97 observations deleted due to missingness)
Multiple R-squared:  0.4445,    Adjusted R-squared:  0.3782 
F-statistic: 6.703 on 8 and 67 DF,  p-value: 2.04e-06

Interestingly, we see in the original model that the relationship between shift in per capita income (“growth”) and shift in democracy is statistically significant at the 0.05 significance level (p-value=0.01). It is important to note that while the authors of “Income and Democracy” did conclude that there is no causal relationship between per capita shift in income and shift in democracy, this is not to say that there is no level of statistical significance. I will next add my additional control variables and interaction term to see if these terms cause an increase in statistical significance.

Original model plus two control variables:

Code
# multiple regression model with all variables

reg_2 <- lm(democ ~ growth + exec_constraint + year_ind + log_settler_mort + log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80 + log_rel_unaff + log_education, data = data_final)
summary(reg_2)

Call:
lm(formula = democ ~ growth + exec_constraint + year_ind + log_settler_mort + 
    log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80 + 
    log_rel_unaff + log_education, data = data_final)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40440 -0.11454  0.04055  0.12198  0.39307 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)   
(Intercept)       4.1044593  1.3841245   2.965  0.00482 **
growth            0.0604059  0.0512635   1.178  0.24485   
exec_constraint   0.2771048  0.1134864   2.442  0.01861 * 
year_ind         -0.0021348  0.0007169  -2.978  0.00466 **
log_settler_mort  0.0314009  0.0381540   0.823  0.41485   
log_pop_dens     -0.0186064  0.0228161  -0.815  0.41909   
rel_catho80       0.1287520  0.1321166   0.975  0.33500   
rel_muslim80      0.0914617  0.1673132   0.547  0.58732   
rel_protmg80      0.4323199  0.2728918   1.584  0.12015   
log_rel_unaff    -0.0531038  0.0359359  -1.478  0.14644   
log_education     0.0186680  0.0676827   0.276  0.78395   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2086 on 45 degrees of freedom
  (117 observations deleted due to missingness)
Multiple R-squared:  0.5209,    Adjusted R-squared:  0.4144 
F-statistic: 4.892 on 10 and 45 DF,  p-value: 9.054e-05

We can see based on this multiple regression model that, when controlling for shift in per capita income, constraint on the executive at the time of independence, year of independence, (log) settler mortality rate, (log) population density, Catholic population, Muslim population, Protestant population, religiously-unaffiliated population, and education funding, there is no statistical significance between shift in per capita income (growth) and shift in democracy (democ) when the significance level is 0.05.

We can, however, see that year of independence (p-value=0.001) and constraint on the executive (p-value=0.02) are statistically significant on shift in democracy. Even more interesting is that religiously-unaffiliated population, a control variable that was added to this replication study, is also statistically significant in terms of its relationship with shift in democracy (p-value = 0.02).

Original model plus two control variables and an interaction term:

Code
# multiple regression with education*growth interaction term

reg_3 <- lm(democ ~ growth*log_education + exec_constraint + year_ind + log_settler_mort + log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80 + log_rel_unaff, data = data_final)
summary(reg_3)

Call:
lm(formula = democ ~ growth * log_education + exec_constraint + 
    year_ind + log_settler_mort + log_pop_dens + rel_catho80 + 
    rel_muslim80 + rel_protmg80 + log_rel_unaff, data = data_final)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.40025 -0.11755  0.03737  0.11915  0.39458 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)   
(Intercept)           4.1354717  1.4401179   2.872  0.00626 **
growth                0.0708509  0.1254142   0.565  0.57499   
log_education         0.0277360  0.1204740   0.230  0.81898   
exec_constraint       0.2777775  0.1149933   2.416  0.01993 * 
year_ind             -0.0021562  0.0007618  -2.830  0.00698 **
log_settler_mort      0.0309511  0.0388936   0.796  0.43043   
log_pop_dens         -0.0187166  0.0231032  -0.810  0.42223   
rel_catho80           0.1288971  0.1336063   0.965  0.33994   
rel_muslim80          0.0984125  0.1854725   0.531  0.59836   
rel_protmg80          0.4480402  0.3250997   1.378  0.17512   
log_rel_unaff        -0.0525775  0.0367914  -1.429  0.16005   
growth:log_education -0.0078169  0.0854657  -0.091  0.92754   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2109 on 44 degrees of freedom
  (117 observations deleted due to missingness)
Multiple R-squared:  0.521, Adjusted R-squared:  0.4012 
F-statistic:  4.35 on 11 and 44 DF,  p-value: 0.0002112

In reviewing the third regression model which incorporates education*growth as an interaction term, we do see some minor shifts in p-values, but no shift is large enough to affect statistical significance. So, we can again confirm that, even when adding an interaction term, shift in per capita income is not statistically significant in terms of its relationship with shift in democracy.

I will now combine these three models into a single, more legible table using the ‘stargazer()’ function. We can again see that shift in income per capita is only statistically significant (though not indicative of a causal relationship) in the first model, which does not include the additional control variables or the interaction term.

Code
# use stargazer to create combined table

library(stargazer)

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 
Code
reg_model <- stargazer(reg_1, reg_2, reg_3, type='text')

========================================================================================
                                             Dependent variable:                        
                     -------------------------------------------------------------------
                                                    democ                               
                              (1)                   (2)                    (3)          
----------------------------------------------------------------------------------------
growth                      0.108**                0.060                  0.071         
                            (0.042)               (0.051)                (0.125)        
                                                                                        
exec_constraint              0.132                0.277**                0.278**        
                            (0.102)               (0.113)                (0.115)        
                                                                                        
year_ind                    -0.001*              -0.002***              -0.002***       
                            (0.001)               (0.001)                (0.001)        
                                                                                        
log_settler_mort             0.041                 0.031                  0.031         
                            (0.033)               (0.038)                (0.039)        
                                                                                        
log_pop_dens                -0.015                 -0.019                 -0.019        
                            (0.021)               (0.023)                (0.023)        
                                                                                        
rel_catho80                  0.166                 0.129                  0.129         
                            (0.124)               (0.132)                (0.134)        
                                                                                        
rel_muslim80                -0.002                 0.091                  0.098         
                            (0.121)               (0.167)                (0.185)        
                                                                                        
rel_protmg80                 0.375                 0.432                  0.448         
                            (0.257)               (0.273)                (0.325)        
                                                                                        
log_rel_unaff                                      -0.053                 -0.053        
                                                  (0.036)                (0.037)        
                                                                                        
growth:log_education                                                      -0.008        
                                                                         (0.085)        
                                                                                        
log_education                                      0.019                  0.028         
                                                  (0.068)                (0.120)        
                                                                                        
Constant                    2.259*                4.104***               4.135***       
                            (1.255)               (1.384)                (1.440)        
                                                                                        
----------------------------------------------------------------------------------------
Observations                  76                     56                     56          
R2                           0.445                 0.521                  0.521         
Adjusted R2                  0.378                 0.414                  0.401         
Residual Std. Error     0.238 (df = 67)       0.209 (df = 45)        0.211 (df = 44)    
F Statistic          6.703*** (df = 8; 67) 4.892*** (df = 10; 45) 4.350*** (df = 11; 44)
========================================================================================
Note:                                                        *p<0.1; **p<0.05; ***p<0.01
Code
print(reg_model)
 [1] ""                                                                                        
 [2] "========================================================================================"
 [3] "                                             Dependent variable:                        "
 [4] "                     -------------------------------------------------------------------"
 [5] "                                                    democ                               "
 [6] "                              (1)                   (2)                    (3)          "
 [7] "----------------------------------------------------------------------------------------"
 [8] "growth                      0.108**                0.060                  0.071         "
 [9] "                            (0.042)               (0.051)                (0.125)        "
[10] "                                                                                        "
[11] "exec_constraint              0.132                0.277**                0.278**        "
[12] "                            (0.102)               (0.113)                (0.115)        "
[13] "                                                                                        "
[14] "year_ind                    -0.001*              -0.002***              -0.002***       "
[15] "                            (0.001)               (0.001)                (0.001)        "
[16] "                                                                                        "
[17] "log_settler_mort             0.041                 0.031                  0.031         "
[18] "                            (0.033)               (0.038)                (0.039)        "
[19] "                                                                                        "
[20] "log_pop_dens                -0.015                 -0.019                 -0.019        "
[21] "                            (0.021)               (0.023)                (0.023)        "
[22] "                                                                                        "
[23] "rel_catho80                  0.166                 0.129                  0.129         "
[24] "                            (0.124)               (0.132)                (0.134)        "
[25] "                                                                                        "
[26] "rel_muslim80                -0.002                 0.091                  0.098         "
[27] "                            (0.121)               (0.167)                (0.185)        "
[28] "                                                                                        "
[29] "rel_protmg80                 0.375                 0.432                  0.448         "
[30] "                            (0.257)               (0.273)                (0.325)        "
[31] "                                                                                        "
[32] "log_rel_unaff                                      -0.053                 -0.053        "
[33] "                                                  (0.036)                (0.037)        "
[34] "                                                                                        "
[35] "growth:log_education                                                      -0.008        "
[36] "                                                                         (0.085)        "
[37] "                                                                                        "
[38] "log_education                                      0.019                  0.028         "
[39] "                                                  (0.068)                (0.120)        "
[40] "                                                                                        "
[41] "Constant                    2.259*                4.104***               4.135***       "
[42] "                            (1.255)               (1.384)                (1.440)        "
[43] "                                                                                        "
[44] "----------------------------------------------------------------------------------------"
[45] "Observations                  76                     56                     56          "
[46] "R2                           0.445                 0.521                  0.521         "
[47] "Adjusted R2                  0.378                 0.414                  0.401         "
[48] "Residual Std. Error     0.238 (df = 67)       0.209 (df = 45)        0.211 (df = 44)    "
[49] "F Statistic          6.703*** (df = 8; 67) 4.892*** (df = 10; 45) 4.350*** (df = 11; 44)"
[50] "========================================================================================"
[51] "Note:                                                        *p<0.1; **p<0.05; ***p<0.01"

Results of hypothesis test: We can see based on these regression models that education funding, religiously-unaffiliated population, and education funding*shift in per capita income actually lead to decreased statistical significance between shift in income per capita and shift in democracy. Thus, while these terms do change the causal relationship between shift in per capita income and shift in democracy, the regression outcome ultimately disproves my hypothesis.

Model Comparison:

Adjusted R-Squared

Code
# adjusted R-squared

print(reg_model)
 [1] ""                                                                                        
 [2] "========================================================================================"
 [3] "                                             Dependent variable:                        "
 [4] "                     -------------------------------------------------------------------"
 [5] "                                                    democ                               "
 [6] "                              (1)                   (2)                    (3)          "
 [7] "----------------------------------------------------------------------------------------"
 [8] "growth                      0.108**                0.060                  0.071         "
 [9] "                            (0.042)               (0.051)                (0.125)        "
[10] "                                                                                        "
[11] "exec_constraint              0.132                0.277**                0.278**        "
[12] "                            (0.102)               (0.113)                (0.115)        "
[13] "                                                                                        "
[14] "year_ind                    -0.001*              -0.002***              -0.002***       "
[15] "                            (0.001)               (0.001)                (0.001)        "
[16] "                                                                                        "
[17] "log_settler_mort             0.041                 0.031                  0.031         "
[18] "                            (0.033)               (0.038)                (0.039)        "
[19] "                                                                                        "
[20] "log_pop_dens                -0.015                 -0.019                 -0.019        "
[21] "                            (0.021)               (0.023)                (0.023)        "
[22] "                                                                                        "
[23] "rel_catho80                  0.166                 0.129                  0.129         "
[24] "                            (0.124)               (0.132)                (0.134)        "
[25] "                                                                                        "
[26] "rel_muslim80                -0.002                 0.091                  0.098         "
[27] "                            (0.121)               (0.167)                (0.185)        "
[28] "                                                                                        "
[29] "rel_protmg80                 0.375                 0.432                  0.448         "
[30] "                            (0.257)               (0.273)                (0.325)        "
[31] "                                                                                        "
[32] "log_rel_unaff                                      -0.053                 -0.053        "
[33] "                                                  (0.036)                (0.037)        "
[34] "                                                                                        "
[35] "growth:log_education                                                      -0.008        "
[36] "                                                                         (0.085)        "
[37] "                                                                                        "
[38] "log_education                                      0.019                  0.028         "
[39] "                                                  (0.068)                (0.120)        "
[40] "                                                                                        "
[41] "Constant                    2.259*                4.104***               4.135***       "
[42] "                            (1.255)               (1.384)                (1.440)        "
[43] "                                                                                        "
[44] "----------------------------------------------------------------------------------------"
[45] "Observations                  76                     56                     56          "
[46] "R2                           0.445                 0.521                  0.521         "
[47] "Adjusted R2                  0.378                 0.414                  0.401         "
[48] "Residual Std. Error     0.238 (df = 67)       0.209 (df = 45)        0.211 (df = 44)    "
[49] "F Statistic          6.703*** (df = 8; 67) 4.892*** (df = 10; 45) 4.350*** (df = 11; 44)"
[50] "========================================================================================"
[51] "Note:                                                        *p<0.1; **p<0.05; ***p<0.01"

We can see in the stargazer table above that “reg_2,” or the regression model that includes the additional control variables but not the interaction term, has the highest adjusted R-squared value , which is an indicator of best fit as it represents the model with the smallest residuals.

PRESS

Code
# PRESS

library(MPV)
Warning: package 'MPV' was built under R version 4.2.3
Loading required package: lattice
Loading required package: KernSmooth
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
Loading required package: randomForest
Warning: package 'randomForest' was built under R version 4.2.2
randomForest 4.7-1.1
Type rfNews() to see new features/changes/bug fixes.

Attaching package: 'randomForest'
The following object is masked from 'package:dplyr':

    combine
The following object is masked from 'package:ggplot2':

    margin
Code
#reg_1 = 4.999083

PRESS(reg_1)
[1] 4.999083
Code
#reg_2 = 3.258505

PRESS(reg_2)
[1] 3.258505
Code
#reg_3 = 3.460336

PRESS(reg_3)
[1] 3.460336

We can see in the above Predicted Residual Sum of Squares (PRESS) tests that “reg_2,” or the regression model that includes the additional control variables but not the interaction term, has the lowest PRESS value, thus indicating that it is the best-fit model.

AIC

Code
# select number of observations from reg_1 to match reg_2 and reg_3

reg_1_data <- data_final

# set observations to sample

observations <- 136

# set seed

set.seed(14)

# select observations

sampled_indices <- sample(nrow(reg_1_data), observations)
sampled_data <- reg_1_data[sampled_indices, ]

# fit  model with sample data

reg_1_sample <- lm(democ ~ growth + exec_constraint + year_ind + log_settler_mort + log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80, data=sampled_data)
reg_1_sample

Call:
lm(formula = democ ~ growth + exec_constraint + year_ind + log_settler_mort + 
    log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80, 
    data = sampled_data)

Coefficients:
     (Intercept)            growth   exec_constraint          year_ind  
        3.963300          0.057183          0.226591         -0.001926  
log_settler_mort      log_pop_dens       rel_catho80      rel_muslim80  
        0.042039         -0.011438          0.043297         -0.044685  
    rel_protmg80  
        0.214080  
Code
# check number of observations

reg_model <- stargazer(reg_1_sample, reg_2, reg_3, type='text')

========================================================================================
                                             Dependent variable:                        
                     -------------------------------------------------------------------
                                                    democ                               
                              (1)                   (2)                    (3)          
----------------------------------------------------------------------------------------
growth                       0.057                 0.060                  0.071         
                            (0.053)               (0.051)                (0.125)        
                                                                                        
exec_constraint             0.227*                0.277**                0.278**        
                            (0.122)               (0.113)                (0.115)        
                                                                                        
year_ind                   -0.002**              -0.002***              -0.002***       
                            (0.001)               (0.001)                (0.001)        
                                                                                        
log_settler_mort             0.042                 0.031                  0.031         
                            (0.039)               (0.038)                (0.039)        
                                                                                        
log_pop_dens                -0.011                 -0.019                 -0.019        
                            (0.023)               (0.023)                (0.023)        
                                                                                        
rel_catho80                  0.043                 0.129                  0.129         
                            (0.146)               (0.132)                (0.134)        
                                                                                        
rel_muslim80                -0.045                 0.091                  0.098         
                            (0.156)               (0.167)                (0.185)        
                                                                                        
rel_protmg80                 0.214                 0.432                  0.448         
                            (0.304)               (0.273)                (0.325)        
                                                                                        
log_rel_unaff                                      -0.053                 -0.053        
                                                  (0.036)                (0.037)        
                                                                                        
growth:log_education                                                      -0.008        
                                                                         (0.085)        
                                                                                        
log_education                                      0.019                  0.028         
                                                  (0.068)                (0.120)        
                                                                                        
Constant                    3.963**               4.104***               4.135***       
                            (1.565)               (1.384)                (1.440)        
                                                                                        
----------------------------------------------------------------------------------------
Observations                  56                     56                     56          
R2                           0.436                 0.521                  0.521         
Adjusted R2                  0.340                 0.414                  0.401         
Residual Std. Error     0.229 (df = 47)       0.209 (df = 45)        0.211 (df = 44)    
F Statistic          4.541*** (df = 8; 47) 4.892*** (df = 10; 45) 4.350*** (df = 11; 44)
========================================================================================
Note:                                                        *p<0.1; **p<0.05; ***p<0.01
Code
# AIC

library(AICcmodavg)
Warning: package 'AICcmodavg' was built under R version 4.2.3

Attaching package: 'AICcmodavg'
The following object is masked from 'package:randomForest':

    importance
Code
models <- list(reg_1_sample, reg_2, reg_3)
mod.names <- c('reg_1_sample', 'reg_2', 'reg_3')

aictab(cand.set = models, modnames = mod.names)

Model selection based on AICc:

              K AICc Delta_AICc AICcWt Cum.Wt    LL
reg_2        12  2.4       0.00   0.82   0.82 14.43
reg_3        13  5.8       3.40   0.15   0.97 14.44
reg_1_sample 10  9.0       6.61   0.03   1.00  7.94

We can see that based on the Akaike Information Criterion (AIC) test, “reg_2,” or the regression model that includes the additional control variables but not the interaction term, has the lowest AICc value. Therefore, because minimizing the AICc value equates to minimizing the number of unnecessary model parameters, “reg_2” is the best fit.

Diagnostics:

Based on the Adjusted R-Squared, PRESS, and AIC tests, “reg_2,” or the regression model that includes the additional control variables but not the interaction term, is universally considered the model of best fit.

Diagnostic plots for the final model (reg_2):

Code
# plots for reg_2

par(mfrow=c(2,3))
plot(reg_2, which=1:6)

  • Residuals vs. Fitted: In applying the Residuals vs. Fitted plot, we can observe a violation due to the curvature of the red line. We also see that the distribution of data points does not “bounce randomly” about the 0 line. This plot is therefore a bad fit for this regression model.

  • Normal Q-Q: In applying the Normal Q-Q plot, we observe that the data points do generally fall along the 0 line, which is indicative of a good fit for this regression model.

  • Scale-Location: In applying the Scale-Location plot, we observe what appears to be heteroscedasticity across the data points as well as a non-horizontal red line. Because see indications of heteroscedasticity, we should run a Breusch-Pagan test to confirm.

    Code
    library(lmtest)
    Warning: package 'lmtest' was built under R version 4.2.3
    Loading required package: zoo
    Warning: package 'zoo' was built under R version 4.2.3
    
    Attaching package: 'zoo'
    The following objects are masked from 'package:base':
    
        as.Date, as.Date.numeric
    Code
    bp_test <- lm(democ ~ growth + exec_constraint + year_ind + log_settler_mort + log_pop_dens + rel_catho80 + rel_muslim80 + rel_protmg80 + log_rel_unaff + log_education, data = data_final)
    bptest(bp_test)
    
        studentized Breusch-Pagan test
    
    data:  bp_test
    BP = 9.8732, df = 10, p-value = 0.4517

    The results of the Breusch-Pagan test show that we cannot reject the null hypothesis (p-value=0.45), which in this case is homoscedasticity. So, there is no violation of the assumption of homoscedasticity. This plot is therefore a good fit for this regression model.

  • Cook’s Distance: In applying the Cook’s Distance plot, we observe a violation due to several data points yielding a Cook’s distance value greater than 1. This plot is therefore a bad fit for this regression model.

  • Residuals vs. Leverage: In applying the Residuals vs. Leverage plot, we can see that there are no data points outside of the dashed lines marked with ‘0.5.’ This plot is therefore a good fit for this regression model.

  • Cook’s Distance vs. Leverage: In applying the Cook’s Distance vs. Leverage plot, we do not really observe any influential data points. This plot does not contribute any value to this regression model.

Sources:

  1. Data set: Acemoglu, Daron, et al. "Replication Data For: Income and Democracy." American Economic Association, vol. 1, 12 Oct. 2019, www.openicpsr.org/openicpsr/project/113251/version/V1/view?path=%2Fopenicpsr%2F113251%2Ffcr%3Aversions%2FV1%2FIncome-and-Democracy-Data-AER-adjustment.xls&type=file, https://doi.org/10.3886/e113251v1.
  2. Study: Acemoglu, D., Johnson, S., Robinson, J. A., & Yared, P. (2008, June 3). Income and democracy. American Economic Review. https://www.aeaweb.org/articles?id=10.1257%2Faer.98.3.808
  3. Comparison study: Barro, Robert J. "Determinants of Democracy." Journal of Political Economy, vol. 107, no. S6, Dec. 1999, pp. S158–S183, https://doi.org/10.1086/250107.und
  4. Additional religious affiliation variables: Datopian. "World Religion Projections." DataHub, 2018, datahub.io/sagargg/world-religion-projections#data.
  5. Education variable: The World Bank. "Government Expenditure on Education, Total (% of GDP) | Data." Worldbank.org, 2019, data.worldbank.org/indicator/SE.XPD.TOTL.GD.ZS.