Challenge 4 Instructions

challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
More data wrangling: pivoting
Author

Meredith Rolfe

Published

August 18, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
df <- read.csv("_data/abc_poll_2021.csv", fileEncoding = "UTF-8")
head(df)
       id xspanish complete_status ppage
1 7230001  English       qualified    68
2 7230002  English       qualified    85
                                                           ppeduc5    ppeducat
1 High school graduate (high school diploma or the equivalent GED) High school
2                                                         Bachelor            
  ppgender              ppethm pphhsize             ppinc7    ppmarit5
1   Female White, Non-Hispanic        2 $25,000 to $49,999 Now Married
2                                    NA                               
    ppmsacat ppreg4                                                    pprent
1 Metro area  South Owned or being bought by you or someone in your household
2                                                                            
  ppstaten PPWORKA    ppemploy    Q1_a    Q1_b       Q1_c    Q1_d    Q1_e
1  Florida Retired Not working Approve Approve Disapprove Approve Approve
2                                                                        
        Q1_f               Q2  Q3   Q4         Q5       QPID ABCAGE
1 Disapprove Not so concerned Yes Good Optimistic A Democrat    65+
2                                                                  
                                 Contact weights_pid
1 No, I am not willing to be interviewed      0.6382
2                                                 NA

Briefly describe the data

This data appears to be a survey dataset, possibly related to public opinion or market research. Each row represents a respondent/participant and includes several variables that provide information about their demographic characteristics and opinions.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

It appears that the data is already in a “tidy” format, with each row representing a separate observation and each column representing a separate variable.

Code
# summarize the data
summary_table <- summarytools::dfSummary(df, 
                           plain.ascii = FALSE, 
                           style = "grid",
                           valid.col = FALSE)

# view the summary table
summary_table
### Data Frame Summary  
#### df  
**Dimensions:** 2 x 31  
**Duplicates:** 0  

+----+------------------+-------------------------------+----------------------+----------------------+---------+
| No | Variable         | Stats / Values                | Freqs (% of Valid)   | Graph                | Missing |
+====+==================+===============================+======================+======================+=========+
| 1  | id\              | Min  : 7230001\               | 7230001 : 1 (50.0%)\ | IIIIIIIIII \         | 0\      |
|    | [integer]        | Mean : 7230001.5\             | 7230002 : 1 (50.0%)  | IIIIIIIIII           | (0.0%)  |
|    |                  | Max  : 7230002                |                      |                      |         |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 2  | xspanish\        | 1\. English                   | 2 (100.0%)           | IIIIIIIIIIIIIIIIIIII | 0\      |
|    | [character]      |                               |                      |                      | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 3  | complete_status\ | 1\. qualified                 | 2 (100.0%)           | IIIIIIIIIIIIIIIIIIII | 0\      |
|    | [character]      |                               |                      |                      | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 4  | ppage\           | Min  : 68\                    | 68 : 1 (50.0%)\      | IIIIIIIIII \         | 0\      |
|    | [integer]        | Mean : 76.5\                  | 85 : 1 (50.0%)       | IIIIIIIIII           | (0.0%)  |
|    |                  | Max  : 85                     |                      |                      |         |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 5  | ppeduc5\         | 1\. Bachelor\                 | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. High school graduate (hig | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 6  | ppeducat\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. High school               | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 7  | ppgender\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Female                    | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 8  | ppethm\          | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. White, Non-Hispanic       | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 9  | pphhsize\        | 1 distinct value              | 2 : 1 (100.0%)       | IIIIIIIIIIIIIIIIIIII | 1\      |
|    | [integer]        |                               |                      |                      | (50.0%) |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 10 | ppinc7\          | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. $25,000 to $49,999        | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 11 | ppmarit5\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Now Married               | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 12 | ppmsacat\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Metro area                | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 13 | ppreg4\          | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. South                     | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 14 | pprent\          | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Owned or being bought by  | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 15 | ppstaten\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Florida                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 16 | PPWORKA\         | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Retired                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 17 | ppemploy\        | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Not working               | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 18 | Q1_a\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Approve                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 19 | Q1_b\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Approve                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 20 | Q1_c\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Disapprove                | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 21 | Q1_d\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Approve                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 22 | Q1_e\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Approve                   | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 23 | Q1_f\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Disapprove                | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 24 | Q2\              | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Not so concerned          | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 25 | Q3\              | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Yes                       | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 26 | Q4\              | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Good                      | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 27 | Q5\              | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. Optimistic                | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 28 | QPID\            | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. A Democrat                | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 29 | ABCAGE\          | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. 65+                       | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 30 | Contact\         | 1\. (Empty string)\           | 1 (50.0%)\           | IIIIIIIIII \         | 0\      |
|    | [character]      | 2\. No, I am not willing to b | 1 (50.0%)            | IIIIIIIIII           | (0.0%)  |
+----+------------------+-------------------------------+----------------------+----------------------+---------+
| 31 | weights_pid\     | 1 distinct value              | 1 distinct values    | IIIIIIIIIIIIIIIIIIII | 1\      |
|    | [numeric]        |                               |                      |                      | (50.0%) |
+----+------------------+-------------------------------+----------------------+----------------------+---------+

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here. The variable “ppeduc5” is currently in character format, which may need to be converted to a factor if it is to be used in statistical modeling or visualization.

Any additional comments?