Final Project Part 1

finalpart1

Author

Emma Rasmussen

Published

October 11, 2022

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

library(readxl)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(stringr)
library(googlesheets4)

Research Question:

Does political partisanship correlate with COVID-19 death rates?

The COVID-19 pandemic became a political matter. Behaviors associated with COVID-19 prevention were adopted on partisan lines (masking, social distancing, and vaccine uptake). Early in the pandemic, mask mandates were protested in some communities. My research question is have these behaviors affected COVID-19 death rates along partisan lines? If so, public health interventions could target communities that may be higher risk for COVID-19 deaths based on political partisanship.

I am thinking death toll would make the most sense to measure than infection rates as infection rates are constantly changing (other studies have looked at infection rates over waves of the pandemic, see this study from the Pew Research Center (Jones 2022)). I also think that one way to measure partisanship will be the 2020 county-level election results (% voting for Trump). In other words, my research is looking to see if (county-level) Trump support correlates with COVID-19 death rates. Both these variables can be found in county-level data sets so I can join multiple dataset with county name (or FIPS code) as the “key”.

Other variables to consider at the county-level (confounding variables): vaccine (and booster) uptake, average age of population

Hypothesis:

While I came up with this research idea on my own, other organizations such as NPR (Wood and Brumfiel 2021) and the Pew Research Center ()have already tested this. For this project, I will use the most recent data I can find. I was hoping to consider the confounding variable of population density, for instance I am guessing more urban populations will tend to vote democratic but these more densely populated places may also have higher infection rates. However, I cannot find any county level population density data sets, so I may use the “Urban Rural Description” variable in one of my datasets.

H0: B1 (and all beta values) is zero. There is no correlation Ha: B1 (or any beta value) is not zero. There is a correlation between partisanship and COVID-19 death rates.

Descriptive Statistics:

#Reading in the data from google sheets
gs4_deauth()

votedf<-read_sheet("https://docs.google.com/spreadsheets/d/1fmxoA_bibvsxsvgRdVPCgMA7DkmJNZfxiWgLgCLcsOY/edit#gid=937778872")

coviddf<-read_sheet("https://docs.google.com/spreadsheets/d/1Hy2O3HxhZGF_fhu6jgmoC2ibWwJTlI7pQOESBOd4hTU/edit#gid=787918384")

#Changing fips code to character format and adding in leading zeros
coviddf$"FIPS Code" <- as.character(coviddf$"FIPS Code")
coviddf<-mutate(coviddf, FIPSNEW=str_pad(coviddf$"FIPS Code", 5, pad = "0"))
head(coviddf, 12)

# A tibble: 12 × 22
   `Data as of`        `Start Date`        `End Date`          State County Na…¹
   <dttm>              <dttm>              <dttm>              <chr> <chr>      
 1 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Anchorage …
 2 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Anchorage …
 3 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Anchorage …
 4 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Fairbanks …
 5 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Fairbanks …
 6 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Fairbanks …
 7 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Matanuska-…
 8 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Matanuska-…
 9 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AK    Matanuska-…
10 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AL    Autauga Co…
11 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AL    Autauga Co…
12 2022-10-05 00:00:00 2020-01-01 00:00:00 2022-10-01 00:00:00 AL    Autauga Co…
# … with 17 more variables: `Urban Rural Code` <dbl>, `FIPS State` <dbl>,
#   `FIPS County` <dbl>, `FIPS Code` <chr>, Indicator <chr>,
#   `Total deaths` <dbl>, `COVID-19 Deaths` <dbl>, `Non-Hispanic White` <dbl>,
#   `Non-Hispanic Black` <dbl>,
#   `Non-Hispanic American Indian or Alaska Native` <dbl>,
#   `Non-Hispanic Asian` <dbl>,
#   `Non-Hispanic Native Hawaiian or Other Pacific Islander` <dbl>, …

votedf$county_fips <- as.character(votedf$county_fips)
votedf<-mutate(votedf, county_fipsNEW=str_pad(votedf$county_fips, 5, pad = "0"))
head(votedf, 12)

# A tibble: 12 × 13
    year state   state_po county_…¹ count…² office candi…³ party candi…⁴ total…⁵
   <dbl> <chr>   <chr>    <chr>     <chr>   <chr>  <chr>   <chr>   <dbl>   <dbl>
 1  2000 ALABAMA AL       AUTAUGA   1001    US PR… AL GORE DEMO…    4942   17208
 2  2000 ALABAMA AL       AUTAUGA   1001    US PR… GEORGE… REPU…   11993   17208
 3  2000 ALABAMA AL       AUTAUGA   1001    US PR… RALPH … GREEN     160   17208
 4  2000 ALABAMA AL       AUTAUGA   1001    US PR… OTHER   OTHER     113   17208
 5  2000 ALABAMA AL       BALDWIN   1003    US PR… AL GORE DEMO…   13997   56480
 6  2000 ALABAMA AL       BALDWIN   1003    US PR… GEORGE… REPU…   40872   56480
 7  2000 ALABAMA AL       BALDWIN   1003    US PR… RALPH … GREEN    1033   56480
 8  2000 ALABAMA AL       BALDWIN   1003    US PR… OTHER   OTHER     578   56480
 9  2000 ALABAMA AL       BARBOUR   1005    US PR… AL GORE DEMO…    5188   10395
10  2000 ALABAMA AL       BARBOUR   1005    US PR… GEORGE… REPU…    5096   10395
11  2000 ALABAMA AL       BARBOUR   1005    US PR… RALPH … GREEN      46   10395
12  2000 ALABAMA AL       BARBOUR   1005    US PR… OTHER   OTHER      65   10395
# … with 3 more variables: version <dbl>, mode <chr>, county_fipsNEW <chr>, and
#   abbreviated variable names ¹county_name, ²county_fips, ³candidate,
#   ⁴candidatevotes, ⁵totalvotes

summary(votedf)

      year         state             state_po         county_name       
 Min.   :2000   Length:72617       Length:72617       Length:72617      
 1st Qu.:2004   Class :character   Class :character   Class :character  
 Median :2012   Mode  :character   Mode  :character   Mode  :character  
 Mean   :2011                                                           
 3rd Qu.:2020                                                           
 Max.   :2020                                                           
 county_fips           office           candidate            party          
 Length:72617       Length:72617       Length:72617       Length:72617      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 candidatevotes      totalvotes         version             mode          
 Min.   :      0   Min.   :      0   Min.   :20220315   Length:72617      
 1st Qu.:    115   1st Qu.:   5175   1st Qu.:20220315   Class :character  
 Median :   1278   Median :  11194   Median :20220315   Mode  :character  
 Mean   :  10782   Mean   :  42514   Mean   :20220315                     
 3rd Qu.:   5848   3rd Qu.:  29855   3rd Qu.:20220315                     
 Max.   :3028885   Max.   :4264365   Max.   :20220315                     
 county_fipsNEW    
 Length:72617      
 Class :character  
 Mode  :character

summary(coviddf)

   Data as of           Start Date            End Date         
 Min.   :2022-10-05   Min.   :2020-01-01   Min.   :2022-10-01  
 1st Qu.:2022-10-05   1st Qu.:2020-01-01   1st Qu.:2022-10-01  
 Median :2022-10-05   Median :2020-01-01   Median :2022-10-01  
 Mean   :2022-10-05   Mean   :2020-01-01   Mean   :2022-10-01  
 3rd Qu.:2022-10-05   3rd Qu.:2020-01-01   3rd Qu.:2022-10-01  
 Max.   :2022-10-05   Max.   :2020-01-01   Max.   :2022-10-01  
                                                               
    State           County Name        Urban Rural Code   FIPS State   
 Length:3495        Length:3495        Min.   :1.000    Min.   : 1.00  
 Class :character   Class :character   1st Qu.:2.000    1st Qu.:18.00  
 Mode  :character   Mode  :character   Median :4.000    Median :33.00  
                                       Mean   :3.645    Mean   :30.47  
                                       3rd Qu.:5.000    3rd Qu.:42.00  
                                       Max.   :6.000    Max.   :56.00  
                                                                       
  FIPS County      FIPS Code          Indicator          Total deaths   
 Min.   :  1.00   Length:3495        Length:3495        Min.   :   621  
 1st Qu.: 31.00   Class :character   Class :character   1st Qu.:  1690  
 Median : 71.00   Mode  :character   Mode  :character   Median :  3284  
 Mean   : 99.37                                         Mean   :  7163  
 3rd Qu.:121.00                                         3rd Qu.:  6990  
 Max.   :840.00                                         Max.   :220829  
                                                                        
 COVID-19 Deaths   Non-Hispanic White Non-Hispanic Black
 Min.   :  101.0   Min.   :0.0270     Min.   :0.0010    
 1st Qu.:  176.0   1st Qu.:0.6677     1st Qu.:0.0230    
 Median :  364.0   Median :0.8300     Median :0.0690    
 Mean   :  852.7   Mean   :0.7742     Mean   :0.1242    
 3rd Qu.:  844.0   3rd Qu.:0.9290     3rd Qu.:0.1800    
 Max.   :31013.0   Max.   :1.0000     Max.   :0.7610    
                   NA's   :3          NA's   :592       
 Non-Hispanic American Indian or Alaska Native Non-Hispanic Asian
 Min.   :0.0000                                Min.   :0.0010    
 1st Qu.:0.0020                                1st Qu.:0.0070    
 Median :0.0040                                Median :0.0130    
 Mean   :0.0214                                Mean   :0.0261    
 3rd Qu.:0.0100                                3rd Qu.:0.0280    
 Max.   :0.8610                                Max.   :0.5170    
 NA's   :1701                                  NA's   :1360      
 Non-Hispanic Native Hawaiian or Other Pacific Islander    Hispanic     
 Min.   :0.0000                                         Min.   :0.0030  
 1st Qu.:0.0000                                         1st Qu.:0.0220  
 Median :0.0010                                         Median :0.0480  
 Mean   :0.0023                                         Mean   :0.0987  
 3rd Qu.:0.0010                                         3rd Qu.:0.1090  
 Max.   :0.2000                                         Max.   :0.9870  
 NA's   :2183                                           NA's   :740     
     Other        Urban Rural Description   Footnote           FIPSNEW         
 Min.   :0.0010   Length:3495             Length:3495        Length:3495       
 1st Qu.:0.0090   Class :character        Class :character   Class :character  
 Median :0.0150   Mode  :character        Mode  :character   Mode  :character  
 Mean   :0.0174                                                                
 3rd Qu.:0.0220                                                                
 Max.   :0.2410                                                                
 NA's   :1633

This data is going to require some tidying before merging. In the coviddf, each county is listed 3 times, (once per indicator) so I will likely filter out just the indicator “Distribution of COVID-19 deaths (%)” so each county is listed only once. Similarly, the votedf contains extra years. For my research, I am only concerned with 2016 data so I will filter out % voting for Trump in 2016 as a measure of political affiliation/partisanship. Then I will merge the two dfs based on county names (will also require some data tidying).

The votedf was compiled by the MIT Election Data and Science Lab. It was first published in 2018 and has been updated with the 2020 election. It contains county-level presidential election data beginning in 2000 and going up to the 2020 election. The data has 12 columns, and 72,617 rows (many of which I will filter out before conducting analysis.) There are 1,892 distinct county names in the data set.

The coviddf only has 857 unique county names in the data frame. This may be because not all counties reported COVID-19 death counts. When I join the data sets, I will join so as to only include observations that we have information from both data frames. The coviddf is provisional, meaning that it is consistently updated (I believe on a weekly basis) with current COVID-19 death toll data. It is likely compiled by counties/towns reporting these numbers to the CDC. This data has limitations, not all counties report this, and not all report it accurately/ attribute COVID-19 as the true cause of death in all circumstances. Using the summary function, we can see the “mean” COVID-19 deaths by county is 852.7, however this isn’t super meaningful given each county has this reported 3 times in the data and the median is significantly lower. Statistics provided by the summary function will be more meaningful once the data is tidied.

References

Jones, B. (2022). The Changing Political Geography of COVID-19 Over the Last Two Years. Pew Research Center. March 3, 2022. https://www.pewresearch.org/politics/2022/03/03/the-changing-political-geography-of-covid-19-over-the-last-two-years/

MIT Election Data and Science Lab. (2021) County Presidential Election Returns 2000-2020. Accessed from the Harvard Dataverse [October 11, 2022]. https://doi.org/10.7910/DVN/VOQCHQ

National Center for Health Statistics. (2022). Provisional COVID-19 Deaths by County, and Race and Hispanic Origin. Accessed from the Centers for Disease Control [October 11, 2022]. https://data.cdc.gov/d/k8wy-p9cg

Wood, D. and Brumfiel, G. (2021). Pro-Trump counties now have far higher COVID death rates. Misinformation is to blame. NPR. December 5, 2021. https://www.npr.org/sections/health-shots/2021/12/05/1059828993/data-vaccine-misinformation-trump-counties-covid-death-rate

[Need to add italics to references]