Challenge 6

challenge_6

Visualizing Time and Relationships

Author

Connor Landreth

Published

April 3, 2023

library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
tidy data (as needed, including sanity checks)
mutate variables as needed (including sanity checks)
create at least one graph including time (evolution)

try to make them “publication” ready (optional)
Explain why you choose the specific graph type

Create at least one graph depicting part-whole or flow relationships

try to make them “publication” ready (optional)
Explain why you choose the specific graph type

R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code.

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

debt ⭐
fed_rate ⭐⭐
abc_poll ⭐⭐⭐
usa_hh ⭐⭐⭐
hotel_bookings ⭐⭐⭐⭐
AB_NYC ⭐⭐⭐⭐⭐

setwd(“C:/Github Projects/601_Spring_2023/posts/_data”) poll <- read.csv(“abc_poll_2021.csv”)

head(poll)

Error in head(poll): object 'poll' not found

View(poll)

Error in as.data.frame(x): object 'poll' not found

Briefly describe the data

Tidy Data (as needed)

First off, below I will rename all the columns to actually be usable. Some of them such as ppmsacat aren’t impossible to identify, but the layman would have no idea what that means. Additionally, at first glance, this data is 31 variables and 527 observations. I would be inclined to assume those behind the server seek to identify trends in political parties. (eg do republicans have less education? Do democrats have higher income?, Do independents live in the suburbs with big families or in cities alone?, etc..)

Poll <- poll %>% 
  rename(ID = 'id', Degree = 'ppeduc5', Primary_Language  = 'xspanish', Page = 'ppage', Education_Level = 'ppeducat', Gender = 'ppgender', Household_Size = 'pphhsize', Ethnicity = 'ppethm', Income = 'ppinc7', Marital_Status = 'ppmarit5', Metro_Stat_Area ='ppmsacat', Region = 'ppreg4', Rental_Status = 'pprent', State = 'ppstaten', Retired ='PPWORKA', Employment_Status = 'ppemploy', Complete_Status = 'complete_status', Political_Affiliation = 'QPID', Age = 'ABCAGE', Interview = 'Contact')

Error in rename(., ID = "id", Degree = "ppeduc5", Primary_Language = "xspanish", : object 'poll' not found

View(Poll)

Error in as.data.frame(x): object 'Poll' not found

Next, I will remove some columns that won’t be helpful for this analysis

Poll$Page <- NULL

Error in Poll$Page <- NULL: object 'Poll' not found

Poll$ID <- NULL

Error in Poll$ID <- NULL: object 'Poll' not found

Poll$Complete_Status <- NULL

Error in Poll$Complete_Status <- NULL: object 'Poll' not found

Poll$weights_pid <- NULL

Error in Poll$weights_pid <- NULL: object 'Poll' not found

# My analysis will look at the working class, so I will filter out any retired individuals

Poll_re <- Poll %>% 
  select(Retired, Employment_Status, Education_Level, Income, Household_Size, Ethnicity, Metro_Stat_Area, Region, Political_Affiliation, Age) %>%
  filter(Retired != 'Retired')

Error in select(., Retired, Employment_Status, Education_Level, Income, : object 'Poll' not found

View(Poll_re)

Error in as.data.frame(x): object 'Poll_re' not found

I want to look at the role education and employment status might play in political affiliation. I am going to hypothesize non-retired, unemployed individuals are less educated, but will explore this below and see where their political affiliation falls (Only democrat vs republican)

Democrat_Income <- Poll_re[order(Poll_re$Political_Affiliation), ] %>% 

  select(Income, Political_Affiliation) %>% 
  filter(Political_Affiliation %in% c('A Democrat',"A Republican"))

Error in select(., Income, Political_Affiliation): object 'Poll_re' not found

View(Democrat_Income)

Error in as.data.frame(x): object 'Democrat_Income' not found

Democrat_Income %>% 
  ggplot(aes(x=Political_Affiliation)) +
  geom_histogram(stat = "count") +
  theme_economist() +
  labs(title = " Average Income by Political Affiliation")+
  ylab('Income') +
  xlab('Political Affiliation')

Error in ggplot(., aes(x = Political_Affiliation)): object 'Democrat_Income' not found

n = nrow(poll)

Error in nrow(poll): object 'poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'Less than high school') %>% 
  count()/n

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

29 individuals (or 5.5%) have less than a high school degree

Poll %>% 
  select(Education_Leveal, Political_Affiliation) %>% 
  filter(Education_Leveal == 'High school') %>% 
  count()/n

Error in select(., Education_Leveal, Political_Affiliation): object 'Poll' not found

133 or 25.2% have a high school diploma. I’m interested in seeing what voting preference those with less traditional education have. One step further, we’ll add in political affiliation to the above.

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'Less than high school') %>% 
  filter(Political_Affiliation == 'An Independent') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'Less than high school') %>% 
  filter(Political_Affiliation == 'Skipped') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

poll %>% 
  select(ppeducat, QPID) %>% 
  filter(ppeducat == 'Less than high school') %>% 
  filter(QPID == 'A Democrat') %>% 

  count()

Error in select(., ppeducat, QPID): object 'poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'Less than high school') %>% 
  filter(Political_Affiliation == 'A Republican') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

That actually surprises me that for such a small sample the numbers are dispersed about as I would have expected. I’ll do the same thing for people with only a high school degree and see if their affiliation lines up with what I might expect.

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'High school') %>% 
  filter(Political_Affiliation == 'An Independent') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'High school') %>% 
  filter(Political_Affiliation == 'Skipped') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'High school') %>% 
  filter(Political_Affiliation == 'A Republican') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

Poll %>% 
  select(Education_Level, Political_Affiliation) %>% 
  filter(Education_Level == 'High school') %>% 
  filter(Political_Affiliation == 'A Democrat') %>% 

  count()

Error in select(., Education_Level, Political_Affiliation): object 'Poll' not found

Again, honestly a shockingly clean dispersion. ~40 each for independent, republican, and democrat.


mycols <- c("lightpink", "lightblue","lightyellow", "lightgreen","orange")

barplot(Poll$Political_Affiliation, names.arg = Poll$Political_Affiliation, ylab = "Age",

Error: <text>:10:0: unexpected end of input
8: 
9: 
  ^