HW6

DACSS-601

Katie Popiela
4/28/2022

INTRODUCTION

gss_refined2 <- gss%>%
  select(educ,age.f,polviews)
summary(gss_refined2)
         educ         age.f           polviews  
 12th grade:540   30     :  47   Moderate :713  
 4 years   :307   32     :  47   Conserv  :292  
 2 years   :261   55     :  47   SlghtCons:268  
 1 yr coll :163   42     :  43   Liberal  :244  
 11th grade:101   49     :  43   SlghtLib :208  
 (Other)   :600   (Other):1742   (Other)  :149  
 NA's      :  2   NA's   :   5   NA's     :100  

The table above is a summary of the data I’ll be working with in this project.

For my final project, I will be using the ‘poliscidata’ package in R. In Homework 5 I used ‘polviews’, ‘degree’, and ‘sex’ to examine the impact sex and education have on a person’s political views. I will put the best visualization I was able to create for said variables below, but I will also present some visualizations in which I swap ‘sex’ and ‘degree’ for ‘age.f’ and ‘educ’ (highest education level).

Here is the best visualization I was able to construct for ‘degree’, ‘sex’, and ‘polviews’.

gss_refined<-gss%>%
  select(sex,polviews,degree)
ggplot(gss_refined)+geom_jitter(aes(x=degree, y=polviews,color=sex)) +
  labs(x="Highest Degree Awarded",y="Political Views") +
  facet_grid()

There obviously isn’t any linear relationship here, but it is noteworthy that, by view alone, respondents’ political views do not appear to be biased based on their sex.

Now I’m going to present some visualizations with the swapped variables (‘age.f’ and ‘educ’ rather than ‘sex’ and ‘degree’).

gss_refined2 <-gss %>%
  select(polviews,age.f,educ)
summary(gss_refined2)
      polviews       age.f              educ    
 Moderate :713   30     :  47   12th grade:540  
 Conserv  :292   32     :  47   4 years   :307  
 SlghtCons:268   55     :  47   2 years   :261  
 Liberal  :244   42     :  43   1 yr coll :163  
 SlghtLib :208   49     :  43   11th grade:101  
 (Other)  :149   (Other):1742   (Other)   :600  
 NA's     :100   NA's   :   5   NA's      :  2  

The below visualization is, in my opinion, the most precise. Each categorical variable is shown (in a large enough space to be seen clearly!!) in relation to the numerical variable ‘age.f’. There is a noticeably higher number of respondents from a range of education levels who identify as politically moderate. However, based on the colors corresponding to ‘Highest Year of School’, most of these individuals have between an 11th grade education and 2 years of college.

I would also like to emphasize the respondents’ education in different political view categories. Many dots in the “Liberal”, “SlghtLib”, “SlghtCons”, and “Conserv” categories, interestingly, correspond to the highest levels of education (4 years of college plus graduate education).

ggplot(gss_refined2) + geom_jitter(aes(x=age.f, y=polviews,color=educ),size=1.5) +
  labs(x="Respondent Age",y="Political Views",color="Highest Year of School")+
  facet_grid() + coord_flip()

ggplot(gss_refined2) + geom_col(aes(x=polviews,y=age.f,fill=educ))+coord_flip()

There’s a LOT of info in each of these variables so I’m going to filter them down to make any graphs more readable. I am also simply not a fan of how the above visualization looks (it’s messy and not precise enough to conduct any sort of measurements). I am going to filter ‘age.f’ so that the range will be restricted to 26-45 years old (millennials).

gss_refined.age <-gss_refined2%>%
  filter(age.f==c(26:45))
ggplot(gss_refined.age)+geom_jitter(aes(x=polviews,y=educ,color=age.f))+coord_flip()+labs(x="Political Views",y="Years of College",fill="Age")

I also wanted to filter ‘educ’ down to 12th grade-4 years of college, but for some reason that process only leaves one point on the graph. So for explanatory purposes I will be leaving the graph as it appears above.

An interesting point about this visualization, though, is that the dots representing respondents aged 43-45 do not appear on the graph until the 12th grade marker. Additionally most of the dots representing individuals younger than 35 are kind of skewed to the left side of the graph (less than a high school education).

HW Questions
1. I don’t think anything is missing per se but I still think I can either filter things down a bit more OR create a couple more visualizations that represent different aspects of my topic (i.e. the average age of a respondent who identifies as ‘Moderate’).
2. I hope to be able to finish my analysis and kind of get everything into place by submission time.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Popiela (2022, May 4). Data Analytics and Computational Social Science: HW6. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela895757/

BibTeX citation

@misc{popiela2022hw6,
  author = {Popiela, Katie},
  title = {Data Analytics and Computational Social Science: HW6},
  url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela895757/},
  year = {2022}
}