Final Project - Check-In 1

finalpart1

This project provides a high-level overview of the data set I will be using for my final project. The data set, titled ‘Income and Democracy,’ was published by the American Economic Consortium in 2008.

Author

Caitlin Rowley

Published

March 21, 2023

Backgroun and Research Question:

My research question will focus on the cross-country correlation between income and democracy. A 2008 study titled “Income and Democracy,” published in the American Economic Review, argues that existing studies that establish a strong cross-country correlation between income and democracy do not control for factors that simultaneously affect both variables. Accordingly, this study controls for certain country-fixed effects—such as date of independence, constraints on the executive, and religious affiliation—which thereby removes the statistical association between income per capita and various measures of democracy. This study is the source for my data set.

In contrast, a 1999 study written by Robert J. Barro asserted that improvements in the standard of living predict increase in democracy. However, similar to the argument in “Income and Democracy,” this study found that the allowance of certain economic variables weakens the interplay specifically between democracy and religious affiliation. Nevertheless, Barro claimed that the negative effects from Muslim and non‐religious affiliations remain regardless of control factors.

This incongruity led me to wonder whether we would see a correlation between income and democracy if the economic variables used in the two studies were both updated and more aligned. Specifically, I would like to examine how this would affect both studies’ claims on the role of religious affiliation; in other words, will adding control variables such as education shift the correlation between religious affiliation—though only for Islam and non-religious affiliations—and democracy?

Research question: How will adding education and non-religious affliation as a control variables impact the correlation between religious affiliation and democracy?

Hypothesis:

After reviewing “Income and Democracy,” it does not appear that non-religious affiliation was integrated into the report. Additionally, the authors indicated that education was determined to be statistically insignificant as an independent country-fixed effect within the context of its causal effect on democracy. However, I am curious about how the inclusion of these two variables would affect Barro’s conclusion relative to the correlation between religious affiliation and democracy. As such, by adding these variables and updating existing variables with new data, I will be revisiting a previously-tested hypothesis.

Despite my curiosity, I hypothesize that adding education and non-religious affiliation as control variables will not uncover any statistical significance between religious affiliation and democracy, even when narrowing the scope of religious affiliation to focus solely on Islam and non-religious affiliation.

Descriptive Statistics:

Data for this study was collected from the Freedom House Political Rights Index, the Polity Composite Democracy Index, and data from other studies conducted by Barro and Kenneth A. Bollen.

Variables I will be focusing on include:

Country;
Constraint on the executive;
Year of independence;
Settler mortality;
Population density;
Catholic population;
Muslim population;
Protestant population;
Education;
Shift in per capita income; and
Shift in democracy.

Code

# load libraries:

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.2.2

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.4      ✔ forcats 0.5.2

Warning: package 'readr' was built under R version 4.2.2

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Code

library(readxl)

Warning: package 'readxl' was built under R version 4.2.2

Code

# read in data file:

data_file <- read_excel("C:/Users/caitr/OneDrive/Documents/DACSS/DACSS 603/603_Spring_2023/posts/Final Project/Income-Democracy.xls", sheet = "500 Year Panel") 
head(data_file)

# A tibble: 6 × 15
  code  country     consf…¹ indcent indyear logem4 lpd15…² madid rel_c…³ rel_m…⁴
  <chr> <chr>         <dbl>   <dbl>   <dbl>  <dbl>   <dbl> <dbl>   <dbl>   <dbl>
1 ADO   Andorra      NA        18      1800  NA     NA      1001  NA      NA    
2 AFG   Afghanistan   0        19.2    1919   4.54   2.12   3002   0       0.993
3 AGO   Angola        0.333    19.8    1975   5.63   0.405  2011   0.687   0    
4 ALB   Albania       0.667    19.1    1912  NA      1.99   2009  NA      NA    
5 ARE   United Ara…   0.333    19.7    1971  NA      0      3002   0.004   0.949
6 ARG   Argentina     0        18.2    1816   4.23  -2.21   5001   0.916   0.002
# … with 5 more variables: rel_protmg80 <dbl>, growth <dbl>, democ <dbl>,
#   world <dbl>, colony <dbl>, and abbreviated variable names ¹consfirstaug,
#   ²lpd1500s, ³rel_catho80, ⁴rel_muslim80

Code

# remove dummy/unnecessary variables (as identified in study's variable key):

data_cln = subset(data_file, select = -c(code, world, colony, indcent, madid))
head(data_cln)

# A tibble: 6 × 10
  country   consf…¹ indyear logem4 lpd15…² rel_c…³ rel_m…⁴ rel_p…⁵  growth democ
  <chr>       <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra    NA        1800  NA     NA      NA      NA      NA      3.46   NA   
2 Afghanis…   0        1919   4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola      0.333    1975   5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania     0.667    1912  NA      1.99   NA      NA      NA      1.68    0.75
5 United A…   0.333    1971  NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argentina   0        1816   4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with abbreviated variable names ¹consfirstaug, ²lpd1500s, ³rel_catho80,
#   ⁴rel_muslim80, ⁵rel_protmg80

Code

# remove duplicates:

duplicates <- duplicated(data_cln)
duplicates["TRUE"]

[1] NA

Code

# remove blank observations (observations with some NAs are not removed):

data_blank <- data_cln[rowSums(is.na(data_cln)) != ncol(data_cln), ]
head(data_blank)

# A tibble: 6 × 10
  country   consf…¹ indyear logem4 lpd15…² rel_c…³ rel_m…⁴ rel_p…⁵  growth democ
  <chr>       <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1 Andorra    NA        1800  NA     NA      NA      NA      NA      3.46   NA   
2 Afghanis…   0        1919   4.54   2.12    0       0.993   0     -0.0849  0.15
3 Angola      0.333    1975   5.63   0.405   0.687   0       0.198  0.644   0.35
4 Albania     0.667    1912  NA      1.99   NA      NA      NA      1.68    0.75
5 United A…   0.333    1971  NA      0       0.004   0.949   0.003  3.37    0.1 
6 Argentina   0        1816   4.23  -2.21    0.916   0.002   0.027  2.71    0.9 
# … with abbreviated variable names ¹consfirstaug, ²lpd1500s, ³rel_catho80,
#   ⁴rel_muslim80, ⁵rel_protmg80

Code

# remove some NAs for description but not analysis:

data_NA <- data_cln[rowSums(is.na(data_cln)) == 0, ]
dim(data_NA)

[1] 76 10

Code

# confirm data frame size of clean data set:

dim(data_cln)

[1] 173  10

We can see that this data set has 10 variables and 173 observations (though there will be 12 variables once I collect and add data related to education and non-religious affiliation). There are no duplicate observations, nor are there any blank observations. However, in the case that we remove observations with any missing values, the data set would only have 76 observations. Nonetheless, because the study’s authors elected to utilize incomplete observations, I will do the same.

Code

# summary of data (remove categorical variables):

library(summarytools)

Warning: package 'summarytools' was built under R version 4.2.2


Attaching package: 'summarytools'

The following object is masked from 'package:tibble':

    view

Code

summary <- subset(data_cln, select = -c(country))
dfSummary(summary)

Data Frame Summary  
summary  
Dimensions: 173 x 9  
Duplicates: 1  

--------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values            Freqs (% of Valid)    Graph                 Valid      Missing  
---- -------------- ------------------------- --------------------- --------------------- ---------- ---------
1    consfirstaug   Mean (sd) : 0.4 (0.4)     38 distinct values    :                     150        23       
     [numeric]      min < med < max:                                :     :           .   (86.7%)    (13.3%)  
                    0 < 0.3 < 1                                     :     :           :                       
                    IQR (CV) : 0.7 (0.9)                            : .   :           :                       
                                                                    : : . : . . .     :                       

2    indyear        Mean (sd) : 1911.8 (67)   65 distinct values                    :     173        0        
     [numeric]      min < med < max:                                .               : :   (100.0%)   (0.0%)   
                    1800 < 1947 < 1984                              :               : :                       
                    IQR (CV) : 134 (0)                              :           . . : :                       
                                                                    : : . .   . : : : :                       

3    logem4         Mean (sd) : 4.6 (1.3)     43 distinct values            :             88         85       
     [numeric]      min < med < max:                                        :             (50.9%)    (49.1%)  
                    0.9 < 4.5 < 8                                           : :                               
                    IQR (CV) : 1.4 (0.3)                                    : :                               
                                                                        : : : : : .                           

4    lpd1500s       Mean (sd) : 1.1 (1.6)     98 distinct values            :             151        22       
     [numeric]      min < med < max:                                        :             (87.3%)    (12.7%)  
                    -3.8 < 1.1 < 5.6                                        : : :                             
                    IQR (CV) : 2.2 (1.4)                                  : : : : :                           
                                                                      .   : : : : : .                         

5    rel_catho80    Mean (sd) : 0.3 (0.4)     107 distinct values   :                     152        21       
     [numeric]      min < med < max:                                :                     (87.9%)    (12.1%)  
                    0 < 0.1 < 1                                     :                                         
                    IQR (CV) : 0.6 (1.1)                            :                 .                       
                                                                    : : . : . .     . :                       

6    rel_muslim80   Mean (sd) : 0.2 (0.4)     85 distinct values    :                     152        21       
     [numeric]      min < med < max:                                :                     (87.9%)    (12.1%)  
                    0 < 0 < 1                                       :                                         
                    IQR (CV) : 0.4 (1.5)                            :                                         
                                                                    : .     .       . :                       

7    rel_protmg80   Mean (sd) : 0.1 (0.2)     80 distinct values    :                     151        22       
     [numeric]      min < med < max:                                :                     (87.3%)    (12.7%)  
                    0 < 0 < 1                                       :                                         
                    IQR (CV) : 0.2 (1.7)                            :                                         
                                                                    : . . . .                                 

8    growth         Mean (sd) : 2 (1.1)       143 distinct values               :         172        1        
     [numeric]      min < med < max:                                    .   :   :   .     (99.4%)    (0.6%)   
                    -0.6 < 1.9 < 4.3                                    : : :   :   :                         
                    IQR (CV) : 1.6 (0.5)                                : : : : : . :                         
                                                                      : : : : : : : : .                       

9    democ          Mean (sd) : 0.7 (0.3)     21 distinct values                    . :   135        38       
     [numeric]      min < med < max:                                                : :   (78.0%)    (22.0%)  
                    0 < 0.8 < 1                                       .             : :                       
                    IQR (CV) : 0.6 (0.5)                            . :   .       : : :                       
                                                                    : : : : . : . : : :                       
--------------------------------------------------------------------------------------------------------------

We can see here a data frame containing summary statistics for the 9 variables with numeric data, as the categorical variable simply indicates the country name. These statistics will be more meaningful upon following the addition of data related to education and non-religious affiliation.

Sources:

URL for data: https://www.openicpsr.org/openicpsr/project/113251/version/V1/view?path=/openicpsr/113251/fcr:versions/V1/Income-and-Democracy-Data-AER-adjustment.xls&type=file
URL for study: http://homepage.ntu.edu.tw/~kslin/macro2009/Acemoglu%20et%20al%202008.pdf
URL for external references: https://www.jstor.org/stable/10.1086/250107