Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Matt Eckstein
March 22, 2023
Today’s challenge is to:
These data come from a 2021 ABC News poll conducted across the United States, containing an ID number, demographic information, answers to five questions (one of which has six components), and metadata related to the conduct of the survey itself from 527 adult respondents.
These data are relatively tidy; for most purposes, these data are more legible in a wide than a long format.
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
It might be helpful, to facilitate calculations and visualizations, to code certain textual question answers as numeric. Here, I have coded two numeric variations of Q4 (one with a numeric version of the existing scale, and another which combines “Good” and “Excellent as 2 and”Not so good” and “Poor” as 1).
Additionally, though these modifications do not use mutate, I modified the dataset to remove some irrelevant or redundant variables and to drop the unnecessary “A” or “And” in front of the party names in the original dataset.
abcpoll3 <- abcpoll2 %>%
separate(QPID, into = c("delete", "Party ID"),sep = "A |An ")
abcpoll3 <- abcpoll3 %>%
replace_na(list("Party ID" = "Something else"))
abcpoll3 <- abcpoll3 %>%
select(-c(delete))
abcpoll3 <- abcpoll3 %>%
mutate(Q4_num = recode(Q4, "Excellent" = 4, "Good" = 3, "Not so good" = 2, "Poor" = 1))
abcpoll3 <- abcpoll3 %>%
mutate(Q4_combined = case_when(
Q4 == "Excellent" | Q4 == "Good" ~ 2,
Q4 == "Not so good" | Q4 == "Poor" ~ 1)
)
---
title: "Challenge 4"
author: "Matt Eckstein"
description: "More data wrangling: pivoting"
date: "03/22/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- abc_poll
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
```{r}
abcpoll <- read.csv("_data/abc_poll_2021.csv")
```
### Briefly describe the data
These data come from a 2021 ABC News poll conducted across the United States, containing an ID number, demographic information, answers to five questions (one of which has six components), and metadata related to the conduct of the survey itself from 527 adult respondents.
## Tidy Data (as needed)
These data are relatively tidy; for most purposes, these data are more legible in a wide than a long format.
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
It might be helpful, to facilitate calculations and visualizations, to code certain textual question answers as numeric. Here, I have coded two numeric variations of Q4 (one with a numeric version of the existing scale, and another which combines "Good" and "Excellent as 2 and "Not so good" and "Poor" as 1).
Additionally, though these modifications do not use mutate, I modified the dataset to remove some irrelevant or redundant variables and to drop the unnecessary "A" or "And" in front of the party names in the original dataset.
```{r}
abcpoll2 <- abcpoll %>%
select(-c(complete_status, ppemploy, Contact, weights_pid))
```
```{r}
abcpoll3 <- abcpoll2 %>%
separate(QPID, into = c("delete", "Party ID"),sep = "A |An ")
abcpoll3 <- abcpoll3 %>%
replace_na(list("Party ID" = "Something else"))
abcpoll3 <- abcpoll3 %>%
select(-c(delete))
abcpoll3 <- abcpoll3 %>%
mutate(Q4_num = recode(Q4, "Excellent" = 4, "Good" = 3, "Not so good" = 2, "Poor" = 1))
abcpoll3 <- abcpoll3 %>%
mutate(Q4_combined = case_when(
Q4 == "Excellent" | Q4 == "Good" ~ 2,
Q4 == "Not so good" | Q4 == "Poor" ~ 1)
)
```