Challenge 4 Instructions

challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
More data wrangling: pivoting
Author

Gabrielle Roman

Published

May 30, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
library(readr)
abc_poll_2021 <- read_csv("_data/abc_poll_2021.csv")
View(abc_poll_2021)
abc_poll_2021
# A tibble: 527 × 31
        id xspanish complete_status ppage ppeduc5       ppeducat ppgender ppethm
     <dbl> <chr>    <chr>           <dbl> <chr>         <chr>    <chr>    <chr> 
 1 7230001 English  qualified          68 "High school… High sc… Female   White…
 2 7230002 English  qualified          85 "Bachelor\x9… Bachelo… Male     White…
 3 7230003 English  qualified          69 "High school… High sc… Male     White…
 4 7230004 English  qualified          74 "Bachelor\x9… Bachelo… Female   White…
 5 7230005 English  qualified          77 "High school… High sc… Male     White…
 6 7230006 English  qualified          70 "Bachelor\x9… Bachelo… Male     White…
 7 7230007 English  qualified          26 "Master\x92s… Bachelo… Male     Other…
 8 7230008 English  qualified          76 "Bachelor\x9… Bachelo… Male     Black…
 9 7230009 English  qualified          78 "Bachelor\x9… Bachelo… Female   White…
10 7230010 English  qualified          47 "Master\x92s… Bachelo… Male     Other…
# ℹ 517 more rows
# ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
#   ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
#   ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
#   Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
#   ABCAGE <chr>, Contact <chr>, weights_pid <dbl>

Briefly describe the data

This data set appears to provide demographic information and answers from a group of participants who completed a poll. Judging by the number of rows, there were 527 participants and 31 columns classifying the participants by variables such as gender, education level, work status, and race.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Code
#political
abc_poll_2021%>%
  select(starts_with("Q"))%>%
  colnames()
 [1] "Q1_a" "Q1_b" "Q1_c" "Q1_d" "Q1_e" "Q1_f" "Q2"   "Q3"   "Q4"   "Q5"  
[11] "QPID"
Code
#demographic
abc_poll_2021%>%
  select(starts_with("pp"))%>%
  colnames()
 [1] "ppage"    "ppeduc5"  "ppeducat" "ppgender" "ppethm"   "pphhsize"
 [7] "ppinc7"   "ppmarit5" "ppmsacat" "ppreg4"   "pprent"   "ppstaten"
[13] "PPWORKA"  "ppemploy"

I will identify which variables constitute demographics; which are political answers, and which are demographic information.

Political answer demographics have “Q” in their heading, while demographic headers begin with “pp”.

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Variables worth adjusting are the “An” and “A” in the QPID column and the “Non-Hispanic” designation in the ppethm column.

Code
table(abc_poll_2021$ppethm)

2+ Races, Non-Hispanic    Black, Non-Hispanic               Hispanic 
                    21                     27                     51 
   Other, Non-Hispanic    White, Non-Hispanic 
                    24                    404 
Code
abc_poll <- abc_poll_2021%>%
  mutate(ethnicity = str_remove(ppethm,", Non-Hispanic"))%>%
  select(-ppethm)

abc_poll_complete <- abc_poll%>%
  mutate(party_affiliation = str_remove(QPID, "A[n]* "))%>%
  select(-QPID)

#sanity check
table(abc_poll_complete$ethnicity)

2+ Races    Black Hispanic    Other    White 
      21       27       51       24      404 
Code
table(abc_poll_complete$party_affiliation)

      Democrat    Independent     Republican        Skipped Something else 
           176            168            152              3             28 
Code
view(abc_poll_complete)

Any additional comments?