Challenge 4
Pradhakshya Dhanakumar
ABC Poll
Author

Pradhakshya Dhanakumar

Published

April 16, 2023

Code
library(tidyverse)
library(lubridate)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Reading Data

Read the data from a .csv file

Code
data <- read_csv("_data/abc_poll_2021.csv")
head(data)
# A tibble: 6 × 31
       id xspanish complete_status ppage ppeduc5        ppeducat ppgender ppethm
    <dbl> <chr>    <chr>           <dbl> <chr>          <chr>    <chr>    <chr> 
1 7230001 English  qualified          68 "High school … High sc… Female   White…
2 7230002 English  qualified          85 "Bachelor\x92… Bachelo… Male     White…
3 7230003 English  qualified          69 "High school … High sc… Male     White…
4 7230004 English  qualified          74 "Bachelor\x92… Bachelo… Female   White…
5 7230005 English  qualified          77 "High school … High sc… Male     White…
6 7230006 English  qualified          70 "Bachelor\x92… Bachelo… Male     White…
# ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
#   ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
#   ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
#   Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
#   ABCAGE <chr>, Contact <chr>, weights_pid <dbl>
Code
colnames(data)
 [1] "id"              "xspanish"        "complete_status" "ppage"          
 [5] "ppeduc5"         "ppeducat"        "ppgender"        "ppethm"         
 [9] "pphhsize"        "ppinc7"          "ppmarit5"        "ppmsacat"       
[13] "ppreg4"          "pprent"          "ppstaten"        "PPWORKA"        
[17] "ppemploy"        "Q1_a"            "Q1_b"            "Q1_c"           
[21] "Q1_d"            "Q1_e"            "Q1_f"            "Q2"             
[25] "Q3"              "Q4"              "Q5"              "QPID"           
[29] "ABCAGE"          "Contact"         "weights_pid"    
Code
dim(data)
[1] 527  31
Code
summary(data)
       id            xspanish         complete_status        ppage      
 Min.   :7230001   Length:527         Length:527         Min.   :18.00  
 1st Qu.:7230132   Class :character   Class :character   1st Qu.:40.00  
 Median :7230264   Mode  :character   Mode  :character   Median :55.00  
 Mean   :7230264                                         Mean   :53.39  
 3rd Qu.:7230396                                         3rd Qu.:67.00  
 Max.   :7230527                                         Max.   :91.00  
   ppeduc5            ppeducat           ppgender            ppethm         
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
   pphhsize            ppinc7            ppmarit5           ppmsacat        
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
    ppreg4             pprent            ppstaten           PPWORKA         
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
   ppemploy             Q1_a               Q1_b               Q1_c          
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     Q1_d               Q1_e               Q1_f                Q2           
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
      Q3                 Q4                 Q5                QPID          
 Length:527         Length:527         Length:527         Length:527        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
    ABCAGE            Contact           weights_pid    
 Length:527         Length:527         Min.   :0.3240  
 Class :character   Class :character   1st Qu.:0.6332  
 Mode  :character   Mode  :character   Median :0.8451  
                                       Mean   :1.0000  
                                       3rd Qu.:1.1516  
                                       Max.   :6.2553  

Data Description

The ABC Poll dataset is a national survey taken for 527 people over 31 questions. The survey delves into various subjects, including 10 questions related to political opinions and beliefs, as well as party identification. Along with this, the dataset comprises 15 demographic variables, which have undergone recoding to make the analysis more accessible. Furthermore, the dataset includes 5 survey administration variables, offering information about the survey’s methodology and logistics. The dataset is a comprehensive collection of information on the surveyed population’s political attitudes and demographics, and it serves as a crucial resource for researchers and analysts seeking to understand these topics.

Tidy and Mutate Data

First we can check if there are is data with any NULL values

Code
sum(is.na(data))
[1] 0

From the above output we can see that there are 0 entries with NA.But on analysing further we can see that there is a value ‘Skipped’ for certain questions. So we can replace these values with NA.

Code
table(data$Q1_a)

   Approve Disapprove    Skipped 
       329        193          5 

Now we can change the ‘Skipped’ values to NA.

Code
data<- data %>% mutate(across(starts_with("Q"), ~ifelse(.=="Skipped", NA, .)))
table(data$Q1_a)

   Approve Disapprove 
       329        193 

Similarly, we can do for QPID too.

Code
unique(data$QPID)
[1] "A Democrat"     "An Independent" "Something else" "A Republican"  
[5] NA              

We can see that there are specific articles like A, An used infront of the column names. It is not necessary, we can remove them.

Code
data <- data %>%
  mutate(QPID = gsub("^A\\s|^An\\s", "", QPID))
table(data$QPID)

      Democrat    Independent     Republican Something else 
           176            168            152             28 
Code
#mutate
df1<-data%>%
  mutate(ethnic = str_remove(ppethm, ", Non-Hispanic"))%>%
  select(-ppethm)

#sanity check
table(df1$ethnic)

2+ Races    Black Hispanic    Other    White 
      21       27       51       24      404