Code
library(tidyverse)
library(lubridate)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Pradhakshya Dhanakumar
April 16, 2023
Read the data from a .csv file
# A tibble: 6 × 31
id xspanish complete_status ppage ppeduc5 ppeducat ppgender ppethm
<dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 7230001 English qualified 68 "High school … High sc… Female White…
2 7230002 English qualified 85 "Bachelor\x92… Bachelo… Male White…
3 7230003 English qualified 69 "High school … High sc… Male White…
4 7230004 English qualified 74 "Bachelor\x92… Bachelo… Female White…
5 7230005 English qualified 77 "High school … High sc… Male White…
6 7230006 English qualified 70 "Bachelor\x92… Bachelo… Male White…
# ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
# ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
# ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
# Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
# ABCAGE <chr>, Contact <chr>, weights_pid <dbl>
[1] "id" "xspanish" "complete_status" "ppage"
[5] "ppeduc5" "ppeducat" "ppgender" "ppethm"
[9] "pphhsize" "ppinc7" "ppmarit5" "ppmsacat"
[13] "ppreg4" "pprent" "ppstaten" "PPWORKA"
[17] "ppemploy" "Q1_a" "Q1_b" "Q1_c"
[21] "Q1_d" "Q1_e" "Q1_f" "Q2"
[25] "Q3" "Q4" "Q5" "QPID"
[29] "ABCAGE" "Contact" "weights_pid"
[1] 527 31
id xspanish complete_status ppage
Min. :7230001 Length:527 Length:527 Min. :18.00
1st Qu.:7230132 Class :character Class :character 1st Qu.:40.00
Median :7230264 Mode :character Mode :character Median :55.00
Mean :7230264 Mean :53.39
3rd Qu.:7230396 3rd Qu.:67.00
Max. :7230527 Max. :91.00
ppeduc5 ppeducat ppgender ppethm
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
pphhsize ppinc7 ppmarit5 ppmsacat
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
ppreg4 pprent ppstaten PPWORKA
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
ppemploy Q1_a Q1_b Q1_c
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Q1_d Q1_e Q1_f Q2
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Q3 Q4 Q5 QPID
Length:527 Length:527 Length:527 Length:527
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
ABCAGE Contact weights_pid
Length:527 Length:527 Min. :0.3240
Class :character Class :character 1st Qu.:0.6332
Mode :character Mode :character Median :0.8451
Mean :1.0000
3rd Qu.:1.1516
Max. :6.2553
The ABC Poll dataset is a national survey taken for 527 people over 31 questions. The survey delves into various subjects, including 10 questions related to political opinions and beliefs, as well as party identification. Along with this, the dataset comprises 15 demographic variables, which have undergone recoding to make the analysis more accessible. Furthermore, the dataset includes 5 survey administration variables, offering information about the survey’s methodology and logistics. The dataset is a comprehensive collection of information on the surveyed population’s political attitudes and demographics, and it serves as a crucial resource for researchers and analysts seeking to understand these topics.
First we can check if there are is data with any NULL values
From the above output we can see that there are 0 entries with NA.But on analysing further we can see that there is a value ‘Skipped’ for certain questions. So we can replace these values with NA.
Now we can change the ‘Skipped’ values to NA.
Approve Disapprove
329 193
Similarly, we can do for QPID too.
We can see that there are specific articles like A, An used infront of the column names. It is not necessary, we can remove them.
Democrat Independent Republican Something else
176 168 152 28
---
title: "Challenge 4"
author: "Pradhakshya Dhanakumar"
desription: "Worked with ABC Poll Dataset"
date: "04/16/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- Challenge 4
- Pradhakshya Dhanakumar
- ABC Poll
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(lubridate)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Reading Data
Read the data from a .csv file
```{r}
data <- read_csv("_data/abc_poll_2021.csv")
head(data)
```
```{r}
colnames(data)
dim(data)
```
```{r}
summary(data)
```
## Data Description
The ABC Poll dataset is a national survey taken for 527 people over 31 questions. The survey delves into various subjects, including 10 questions related to political opinions and beliefs, as well as party identification. Along with this, the dataset comprises 15 demographic variables, which have undergone recoding to make the analysis more accessible. Furthermore, the dataset includes 5 survey administration variables, offering information about the survey's methodology and logistics. The dataset is a comprehensive collection of information on the surveyed population's political attitudes and demographics, and it serves as a crucial resource for researchers and analysts seeking to understand these topics.
## Tidy and Mutate Data
First we can check if there are is data with any NULL values
```{r}
sum(is.na(data))
```
From the above output we can see that there are 0 entries with NA.But on analysing further we can see that there is a value 'Skipped' for certain questions. So we can replace these values with NA.
```{r}
table(data$Q1_a)
```
Now we can change the 'Skipped' values to NA.
```{r}
data<- data %>% mutate(across(starts_with("Q"), ~ifelse(.=="Skipped", NA, .)))
table(data$Q1_a)
```
Similarly, we can do for QPID too.
```{r}
unique(data$QPID)
```
We can see that there are specific articles like A, An used infront of the column names. It is not necessary, we can remove them.
```{r}
data <- data %>%
mutate(QPID = gsub("^A\\s|^An\\s", "", QPID))
table(data$QPID)
```
```{r}
#mutate
df1<-data%>%
mutate(ethnic = str_remove(ppethm, ", Non-Hispanic"))%>%
select(-ppethm)
#sanity check
table(df1$ethnic)
```