Voter Turnout and Partisan Bias in U.S. Presidential Elections

DACSS 602 Final Project Proposal - Fall 2022

finalpart1
nboonstra
On the surface, my research question is fairly straightforward: Does higher turnout in U.S. presidential elections benefit Democratic candidates? However, this question can be assessed in a number of ways, particularly when it comes to measurement.
Author

Nicholas Boonstra

Published

October 12, 2022

Code
rm(list=ls())

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Research Question

On the surface, my research question is fairly straightforward: Does higher turnout in U.S. presidential elections benefit Democratic candidates? However, this question can be assessed in a number of ways, particularly when it comes to measurement. For example, “higher” turnout can be measured in absolute terms across states, or in relative terms within states across elections (i.e. whether turnout increased or decreased, andy by how much). Similarly, “benefit” to Democratic candidates can be assessed in terms of whether or not an election is won as well as how much an election is won or lost by, which itself can also be further broken down into absolute and relative terms. For the sake of validity and robustness, I would naturally like to be able to assess this question in each of these ways, comparing different IVs, DVs, and models; however, I recognize that it may not prove feasible to take such a deep look at this question as the term proceeds.

These questions have been looked at a number of times in the American political science literature, and yet there is little consensus on the effects of turnout on partisan electoral outcomes. One of the earliest, and perhaps one of the most seminal, works in this area is DeNardo (1980), which uses mostly theoretical arguments to counter the conventional wisdom that higher turnout benefits Democrats. A rebuttal – Tucker, Vedlitz, and DeNardo (1986) – counters DeNardo’s argument while also giving the original author an opportunity to double down on his case. More recently, Shaw and Petrocik’s (2020) book The Turnout Myth takes a deeper dive into these questions, attacking the conventional wisdom with both theoretical arguments and empirical evidence, and coming to a similar conclusion to DeNardo: higher turnout does not benefit Democrats. However, I take issue with some of the theoretical underpinnings of Shaw and Petrocik’s argument, as I expand upon below.

I want to note my excitement to work on this project. I began my work on this question as an undergraduate, using the Stata statistical package for my analysis; I am looking forward to learning to use R and Quarto to perform, develop, and present this analysis. Specifically, because of the time-series nature of the data I was taught to go beyond standard linear regression and utilize a Panel-Corrected Standard Errors (PCSE) model. I hope to build on this knowledge and further my understanding of statistical modelling with this project.

Hypothesis

Theory

A review of even such a small sample of the literature as the works mentioned above will clearly demonstrate that, beyond disagreement over the presence of partisan turnout bias, there is little consensus on the theoretical aspect of such a phenomenon. Before offering my hypothesis, therefore, I would like to briefly address this theoretical side of the argument.

Shaw and Petrocik (2020) take issue with a notion found in turnout bias literature, the notion being “that turnout is endogenous to candidate preference” (p. 53). They cite Downs’ (1957) famous equation, \(V=(P*B)-C\), as evidence that it is the intensity of one’s political beliefs, and not their direction, that determines the decision to vote or not, and that therefore turnout is not endogenous to candidate preference.

I believe this argument misses a subtle nuance that is key to the turnout bias debate. Suppose that not all individuals in a given polity face the same costs to voting; assume, in other words, that a more accurate rendition of Downs’ equation would be \(V_i=(P*B_i)-C_i\), in which both cost of voting and the perceived benefit of a preferred candidate’s victory are unique to the individual. For the sake of this argument, the manner in which these costs are distributed is not important; only the fact that there are unequal costs matters. Suppose further that one of the parties in this polity has established itself as being the party that lobbies for a reduction in the cost of voting, particularly for those who face disproportionately high barriers. In a world of rational actors and perfect information, it would follow, ceteris paribus, that an individual who faced disproportionately high costs to voting would support this party, since this party would lobby to improve opportunities for this group. However, the very higher cost of voting that would motivate this individual to support this party could also prevent them from ultimately voting for that candidate in an election. Thus, it could be said that turnout is endogenous to candidate preference – or, more accurately, that the cost of voting is endogenous to both candidate preference and turnout.

We can apply this theoretical model to the American case. Certain individuals do face higher barriers to voting; unfortunately, unlike in the model, these barriers do tend to be distributed in a certain manner, often inequitably by race and socioeconomic status. Additionally, it would not be difficult to argue that, of the two major parties, the Democrats have placed themselves in the position of the party lobbying for expanded voting access and reduction of barriers to the ballot box, starting with their role in the Civil Rights movement and corresponding legislation, and continuing to the start of the present Congress and the introduction of H.R. 1, a bill explicitly aimed at expanding voting rights. Thus, while our world is not one of completely perfect information or completely rational actors, and while a number of factors contribute to partisan identity and vote choice, there is a reasonable case to be made that individuals who face barriers to voting, ceteris paribus, would be more likely to support the Democratic Party. Once again, these very barriers to voting that would push individuals toward the Democrats also can restrict them from expressing that preference at the ballot box. Thus, we have our situation of endogeneity between partisan preference and turnout.

Hypotheses

With the theoretical argument out of the way, I can proceed to out line some of the hypotheses I would like to test with this project.

\(H_1\): Higher turnout will benefit Democrats in state-level Presidential elections.

\(H_2\): Democrats will perform better in state-level Presidential elections as turnout increases relative to the previous election in that state.

The distinction of state-level elections is an important one; Shaw and Petrocik (2020) tend to aggregate their data, either by assessing elections on the national level or by aggregating county-level data. In the United States, Presidential elections are conducted at the state level, and I believe that this is the appropriate level of analysis for this analysis.

Descriptive Statistics

What follows is a brief summary of the datasets I intend to use for this analysis.

Election Data, 1976-2020

Obtained from the MIT Election Project on 10/10/2022.

Code
election_full <- read_csv("./_data/mit_election_1976_2020.csv")

election_full <- election_full %>% 
  mutate(party_simplified2 = case_when(
    party_detailed == "DEMOCRAT" ~ "DEMOCRAT",
    party_detailed == "REPUBLICAN" ~ "REPUBLICAN",
    party_detailed == "LIBERTARIAN" ~ "LIBERTARIAN",
    party_detailed == "GREEN" ~ "GREEN",
    party_detailed == "INDEPENDENT" ~ "INDEPENDENT",
    TRUE ~ "OTHER"
  )) %>% 
  mutate(party_dem = case_when(
    party_detailed == "DEMOCRAT" ~ 1,
    TRUE ~ 0
  ))

head(election_full, n=20)
# A tibble: 20 × 17
    year state    state…¹ state…² state…³ state…⁴ office candi…⁵ party…⁶ writein
   <dbl> <chr>    <chr>     <dbl>   <dbl>   <dbl> <chr>  <chr>   <chr>   <lgl>  
 1  1976 ALABAMA  AL            1      63      41 US PR… "CARTE… DEMOCR… FALSE  
 2  1976 ALABAMA  AL            1      63      41 US PR… "FORD,… REPUBL… FALSE  
 3  1976 ALABAMA  AL            1      63      41 US PR… "MADDO… AMERIC… FALSE  
 4  1976 ALABAMA  AL            1      63      41 US PR… "BUBAR… PROHIB… FALSE  
 5  1976 ALABAMA  AL            1      63      41 US PR… "HALL,… COMMUN… FALSE  
 6  1976 ALABAMA  AL            1      63      41 US PR… "MACBR… LIBERT… FALSE  
 7  1976 ALABAMA  AL            1      63      41 US PR…  <NA>   <NA>    TRUE   
 8  1976 ALASKA   AK            2      94      81 US PR… "FORD,… REPUBL… FALSE  
 9  1976 ALASKA   AK            2      94      81 US PR… "CARTE… DEMOCR… FALSE  
10  1976 ALASKA   AK            2      94      81 US PR… "MACBR… LIBERT… FALSE  
11  1976 ALASKA   AK            2      94      81 US PR…  <NA>   <NA>    TRUE   
12  1976 ARIZONA  AZ            4      86      61 US PR… "FORD,… REPUBL… FALSE  
13  1976 ARIZONA  AZ            4      86      61 US PR… "CARTE… DEMOCR… FALSE  
14  1976 ARIZONA  AZ            4      86      61 US PR… "MCCAR… INDEPE… FALSE  
15  1976 ARIZONA  AZ            4      86      61 US PR… "MACBR… LIBERT… FALSE  
16  1976 ARIZONA  AZ            4      86      61 US PR… "CAMEJ… SOCIAL… FALSE  
17  1976 ARIZONA  AZ            4      86      61 US PR… "ANDER… AMERIC… FALSE  
18  1976 ARIZONA  AZ            4      86      61 US PR… "MADDO… AMERIC… FALSE  
19  1976 ARIZONA  AZ            4      86      61 US PR…  <NA>   <NA>    TRUE   
20  1976 ARKANSAS AR            5      71      42 US PR… "CARTE… DEMOCR… FALSE  
# … with 7 more variables: candidatevotes <dbl>, totalvotes <dbl>,
#   version <dbl>, notes <lgl>, party_simplified <chr>,
#   party_simplified2 <chr>, party_dem <dbl>, and abbreviated variable names
#   ¹​state_po, ²​state_fips, ³​state_cen, ⁴​state_ic, ⁵​candidate, ⁶​party_detailed
Code
colnames(election_full)
 [1] "year"              "state"             "state_po"         
 [4] "state_fips"        "state_cen"         "state_ic"         
 [7] "office"            "candidate"         "party_detailed"   
[10] "writein"           "candidatevotes"    "totalvotes"       
[13] "version"           "notes"             "party_simplified" 
[16] "party_simplified2" "party_dem"        
Code
summary(election_full)
      year         state             state_po           state_fips   
 Min.   :1976   Length:4287        Length:4287        Min.   : 1.00  
 1st Qu.:1988   Class :character   Class :character   1st Qu.:16.00  
 Median :2000   Mode  :character   Mode  :character   Median :28.00  
 Mean   :1999                                         Mean   :28.62  
 3rd Qu.:2012                                         3rd Qu.:41.00  
 Max.   :2020                                         Max.   :56.00  
   state_cen        state_ic        office           candidate        
 Min.   :11.00   Min.   : 1.00   Length:4287        Length:4287       
 1st Qu.:33.00   1st Qu.:22.00   Class :character   Class :character  
 Median :53.00   Median :42.00   Mode  :character   Mode  :character  
 Mean   :53.67   Mean   :39.75                                        
 3rd Qu.:81.00   3rd Qu.:61.00                                        
 Max.   :95.00   Max.   :82.00                                        
 party_detailed      writein        candidatevotes       totalvotes      
 Length:4287        Mode :logical   Min.   :       0   Min.   :  123574  
 Class :character   FALSE:3807      1st Qu.:    1177   1st Qu.:  652274  
 Mode  :character   TRUE :477       Median :    7499   Median : 1569180  
                    NA's :3         Mean   :  311908   Mean   : 2366924  
                                    3rd Qu.:  199242   3rd Qu.: 3033118  
                                    Max.   :11110250   Max.   :17500881  
    version          notes         party_simplified   party_simplified2 
 Min.   :20210113   Mode:logical   Length:4287        Length:4287       
 1st Qu.:20210113   NA's:4287      Class :character   Class :character  
 Median :20210113                  Mode  :character   Mode  :character  
 Mean   :20210113                                                       
 3rd Qu.:20210113                                                       
 Max.   :20210113                                                       
   party_dem     
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :0.0000  
 Mean   :0.1428  
 3rd Qu.:0.0000  
 Max.   :1.0000  

This dataframe contains state-level election results for all 50 states and the District of Columbia for the six Presidential elections from 1976 to 2020. (I am currently not sure that I will use that entire date range, particularly because it does not exactly coincide with the turnout data available, but for now I am including the full data set.) Included in the dataframe are candidate vote totals and party affiliations, which I have used to add an extra column, party_dem, which is a dummy variable recording whether or not a given candidate is a Democrat. The data already come in tidy, which is a nice touch; a “case” or row is a given candidate’s performance in a given state’s Presidential election in a given year.

Turnout data, 1980-2014

Obtained from the US Elections Project on 10/11/2022.

Code
turnout <- read_excel("./_data/1980-2014 November General Election.xlsx",
                      skip=2,
                      col_types=c(
                        "numeric","skip","skip","text",
                        "numeric","numeric","numeric",
                        "numeric","numeric","numeric","numeric",
                        "numeric","numeric","numeric","numeric","numeric","numeric"
                      ),
                      col_names=c(
                        "year","state",
                        "totballots_vep_rate","highestoff_vep_rate","highestoff_vap_rate",
                        "totalballots_count","highestoff_count","vep_count","vap_count",
                        "noncitizen_percent","prison_count","probation_count",
                        "parole_count","totineligible_count","overseas_count"
                      ))

head(turnout,n=20)
# A tibble: 20 × 15
    year state   totba…¹ highe…² highe…³ total…⁴ highe…⁵ vep_c…⁶ vap_c…⁷ nonci…⁸
   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  2014 United…   0.367   0.36    0.332  8.33e7  8.17e7  2.27e8  2.46e8   0.084
 2  2014 Alabama   0.332   0.329   0.315  1.19e6  1.18e6  3.59e6  3.75e6   0.025
 3  2014 Alaska    0.548   0.542   0.51   2.85e5  2.82e5  5.21e5  5.53e5   0.039
 4  2014 Arizona   0.341   0.334   0.295  1.54e6  1.51e6  4.51e6  5.11e6   0.101
 5  2014 Arkans…   0.403   0.401   0.375  8.53e5  8.49e5  2.12e6  2.26e6   0.04 
 6  2014 Califo…   0.307   0.299   0.247  7.51e6  7.32e6  2.44e7  2.96e7   0.168
 7  2014 Colora…   0.547   0.537   0.494  2.08e6  2.04e6  3.80e6  4.13e6   0.072
 8  2014 Connec…   0.425   0.423   0.385  1.10e6  1.09e6  2.58e6  2.83e6   0.082
 9  2014 Delawa…   0.349   0.343   0.318  2.38e5  2.34e5  6.82e5  7.35e5   0.051
10  2014 Distri…   0.357   0.353   0.32   1.77e5  1.75e5  4.96e5  5.47e5   0.094
11  2014 Florida   0.433   0.428   0.376  6.03e6  5.95e6  1.39e7  1.58e7   0.106
12  2014 Georgia   0.386   0.382   0.338  2.60e6  2.57e6  6.73e6  7.60e6   0.071
13  2014 Hawaii    0.365   0.362   0.329  3.70e5  3.66e5  1.01e6  1.11e6   0.086
14  2014 Idaho     0.398   0.393   0.365  4.45e5  4.40e5  1.12e6  1.21e6   0.046
15  2014 Illino…   0.408   0.402   0.366  3.68e6  3.63e6  9.03e6  9.92e6   0.085
16  2014 Indiana   0.287   0.278   0.267  1.39e6  1.34e6  4.83e6  5.03e6   0.035
17  2014 Iowa      0.503   0.498   0.473  1.14e6  1.13e6  2.27e6  2.39e6   0.036
18  2014 Kansas    0.433   0.425   0.398  8.87e5  8.70e5  2.05e6  2.18e6   0.052
19  2014 Kentuc…   0.449   0.442   0.422  1.46e6  1.44e6  3.25e6  3.41e6   0.027
20  2014 Louisi…   0.449   0.439   0.415  1.50e6  1.47e6  3.35e6  3.55e6   0.03 
# … with 5 more variables: prison_count <dbl>, probation_count <dbl>,
#   parole_count <dbl>, totineligible_count <dbl>, overseas_count <dbl>, and
#   abbreviated variable names ¹​totballots_vep_rate, ²​highestoff_vep_rate,
#   ³​highestoff_vap_rate, ⁴​totalballots_count, ⁵​highestoff_count, ⁶​vep_count,
#   ⁷​vap_count, ⁸​noncitizen_percent
Code
colnames(turnout)
 [1] "year"                "state"               "totballots_vep_rate"
 [4] "highestoff_vep_rate" "highestoff_vap_rate" "totalballots_count" 
 [7] "highestoff_count"    "vep_count"           "vap_count"          
[10] "noncitizen_percent"  "prison_count"        "probation_count"    
[13] "parole_count"        "totineligible_count" "overseas_count"     
Code
summary(turnout)
      year         state           totballots_vep_rate highestoff_vep_rate
 Min.   :1980   Length:936         Min.   :0.0000      Min.   :0.2020     
 1st Qu.:1988   Class :character   1st Qu.:0.4310      1st Qu.:0.4140     
 Median :1997   Mode  :character   Median :0.5200      Median :0.5010     
 Mean   :1997                      Mean   :0.5125      Mean   :0.4993     
 3rd Qu.:2006                      3rd Qu.:0.6040      3rd Qu.:0.5840     
 Max.   :2014                      Max.   :0.7880      Max.   :0.7840     
                                   NA's   :215         NA's   :1          
 highestoff_vap_rate totalballots_count  highestoff_count   
 Min.   :0.1990      Min.   :   122356   Min.   :   117623  
 1st Qu.:0.3895      1st Qu.:   422851   1st Qu.:   488820  
 Median :0.4770      Median :  1170867   Median :  1236230  
 Mean   :0.4733      Mean   :  3074280   Mean   :  3509231  
 3rd Qu.:0.5560      3rd Qu.:  2395791   3rd Qu.:  2336586  
 Max.   :0.7390      Max.   :132609063   Max.   :131304731  
 NA's   :1           NA's   :223         NA's   :1          
   vep_count           vap_count         noncitizen_percent  prison_count    
 Min.   :   270122   Min.   :   277261   Min.   :0.00400    Min.   :      0  
 1st Qu.:   999644   1st Qu.:  1044366   1st Qu.:0.01500    1st Qu.:   3464  
 Median :  2662524   Median :  2778086   Median :0.03100    Median :  10018  
 Mean   :  7277622   Mean   :  7840064   Mean   :0.04344    Mean   :  39257  
 3rd Qu.:  4569632   3rd Qu.:  4898253   3rd Qu.:0.06600    3rd Qu.:  24819  
 Max.   :227157964   Max.   :245712915   Max.   :0.18900    Max.   :1605448  
                                                                             
 probation_count    parole_count    totineligible_count overseas_count   
 Min.   :      0   Min.   :     0   Min.   :      0     Min.   :   6916  
 1st Qu.:      0   1st Qu.:     0   1st Qu.:   6210     1st Qu.:  43108  
 Median :   7982   Median :  1870   Median :  21329     Median :  89605  
 Mean   :  67542   Mean   : 16227   Mean   :  90039     Mean   : 920963  
 3rd Qu.:  38902   3rd Qu.:  6592   3rd Qu.:  52525     3rd Qu.:1803021  
 Max.   :2451708   Max.   :637410   Max.   :3363118     Max.   :5345814  
                                                        NA's   :867      

Additional turnout data are available from the USEP by election from 2000-2020, albeit in their own individual spreadsheets; I may end up merging the 2016 and 2020 spreadsheets into this 1980-2014 set. It is important to note that this dataset includes observations for both Presidential and midterm election years, while I only intend to analyze Presidential elections.

This dataset makes distinctions between turnout based on voting-age population (VAP) and voting-eligible population (VEP). The literature generally agrees that VEP is the most reliable and consistent measure. However, given that one of the main differences between the two is the barrier of felony disenfranchisement, a barrier that is often inequitably distributed by race, I may end up using VAP turnout in my analysis; I have not yet decided as of the time of this submission.

Voter ID data, 2000-2020

Obtained from the National Conference of State Legislatures, who kindly provided via email a spreadsheet version of the data on this webpage on 10/11/2022.

Code
voter_id <- read_excel("./_data/voter_id_chronology.xlsx",
                      skip = 2,
                      col_types = c("text","skip","text","skip","text","skip",
                                    "text","skip","text","skip","text","skip",
                                    "text","skip","skip"))

voter_id <- voter_id %>% 
  pivot_longer(cols=c(2:7),
               names_to="year",
               values_to="id_text") %>% 
  mutate(id_req = case_when(
    grepl("no id", id_text, ignore.case = TRUE) ~ 0,
    TRUE ~ 1
  )) %>% 
  mutate(id_strict = case_when(
    grepl("Strict", id_text) ~ 1,
    TRUE ~ 0
  )) %>% 
  mutate(id_photo = case_when(
    grepl(" photo", id_text, ignore.case = TRUE) ~ 1,
    TRUE ~ 0
  ))

head(voter_id,n=20)
# A tibble: 20 × 6
   State    year  id_text                 id_req id_strict id_photo
   <chr>    <chr> <chr>                    <dbl>     <dbl>    <dbl>
 1 Alabama  2000  No ID required at polls      0         0        0
 2 Alabama  2004  Non-strict, non-photo        1         0        0
 3 Alabama  2008  Non-strict, non-photo        1         0        0
 4 Alabama  2012  Non-strict, non-photo        1         0        0
 5 Alabama  2016  Non-strict, photo            1         0        1
 6 Alabama  2020  Non-strict, photo            1         0        1
 7 Alaska   2000  Non-strict, non-photo        1         0        0
 8 Alaska   2004  Non-strict, non-photo        1         0        0
 9 Alaska   2008  Non-strict, non-photo        1         0        0
10 Alaska   2012  Non-strict, non-photo        1         0        0
11 Alaska   2016  Non-strict, non-photo        1         0        0
12 Alaska   2020  Non-strict, non-photo        1         0        0
13 Arizona  2000  No ID required at polls      0         0        0
14 Arizona  2004  No ID required at polls      0         0        0
15 Arizona  2008  Strict non-photo             1         1        0
16 Arizona  2012  Strict non-photo             1         1        0
17 Arizona  2016  Strict non-photo             1         1        0
18 Arizona  2020  Strict non-photo             1         1        0
19 Arkansas 2000  Non-strict, non-photo        1         0        0
20 Arkansas 2004  Non-strict, non-photo        1         0        0
Code
colnames(voter_id)
[1] "State"     "year"      "id_text"   "id_req"    "id_strict" "id_photo" 
Code
summary(voter_id)
    State               year             id_text              id_req      
 Length:306         Length:306         Length:306         Min.   :0.0000  
 Class :character   Class :character   Class :character   1st Qu.:0.0000  
 Mode  :character   Mode  :character   Mode  :character   Median :1.0000  
                                                          Mean   :0.5033  
                                                          3rd Qu.:1.0000  
                                                          Max.   :1.0000  
   id_strict          id_photo     
 Min.   :0.00000   Min.   :0.0000  
 1st Qu.:0.00000   1st Qu.:0.0000  
 Median :0.00000   Median :0.0000  
 Mean   :0.09804   Mean   :0.1928  
 3rd Qu.:0.00000   3rd Qu.:0.0000  
 Max.   :1.00000   Max.   :1.0000  

Given that barriers to voting factor into the argument behind my research, I wanted to include data on voter ID laws in my analysis, as a controlling (or other type of) variable. The data here track voter ID laws across all 50 U.S. states and the District of Columbia from 2000 to 2020.

These data are surprisingly well balanced when it comes to the occurrence of voter ID laws; 50.33 percent of elections were held under voter-ID laws of some sort. Cases are also specified by whether or not a voter ID law was strict (i.e. required the voter to cast a provisional ballot and verify their identity after Election Day), and whether or not the state required a photo on the identification. Strict voter ID laws are the most rare, occurring in only 9.8 percent of elections in the data set; photo requirements are slightly more common, occurring in 19.28 percent of elections.