DACSS-601
1. Read in your dataset and compute descriptive statistics for each of your variables using dplyr. This should include mean, median, SD, and frequencies for categorical variables. Use groupby() and summarise() to compute mean, median, and SD for any relevant groupings.
library(dplyr)
library(tidyverse)
library(poliscidata)
gss %>%
select(polviews,sex,degree)%>%
head(25) %>%
tibble()
# A tibble: 25 x 3
polviews sex degree
<fct> <fct> <fct>
1 Moderate Male Bachelor deg
2 SlghtCons Male HS
3 SlghtCons Male HS
4 SlghtCons Female HS
5 Liberal Female Bachelor deg
6 Moderate Female Bachelor deg
7 Moderate Female Junior Coll
8 Moderate Female <HS
9 Conserv Female <HS
10 Liberal Female Bachelor deg
# ... with 15 more rows
library(poliscidata)
data(gss)
gss_refined<-gss %>%
select(polviews,sex,degree)
summary(gss_refined)
polviews sex degree
Moderate :713 Male : 886 <HS :288
Conserv :292 Female:1088 HS :976
SlghtCons:268 Junior Coll :151
Liberal :244 Bachelor deg:354
SlghtLib :208 Graduate deg:205
(Other) :149
NA's :100
2. Created at least 2 visualizations using your final project dataset
3. Explain each visualization.
4. Identify limitations of said visualizations
Visualization #1
ggplot(data=gss_refined,aes(x=polviews,fill=degree))+
geom_bar() + labs(x="Political Views", fill ="Highest Degree Awarded")
This visualization is univariate (polviews), but it is organized/categorized based on the highest degree awarded to the survey’s respondents. The bar graph is an excellent means of viewing the relationship between education and political views, but I am ultimately looking to visualize the impact sex and education have on respondents’ political views.
Note: I am still debating using age instead of sex, so I will include a graph with that last
Visualization 2
data(gss)
ggplot(gss_refined,aes(x=degree, y=polviews,color=sex)) +
geom_jitter(width=0.2)+labs(x="Highest Degree Awarded",y="Political Views")
This visualization, unlike the first, is more comprehensive and accurate regarding the information I want to focus on. However, it is not conducive to exact (or at least semi-exact) measurements like the bar graph above as there are no numerical markers on either axis. So, now I will attempt to figure out how to incorporate numerical values - hopefully without removing any variables (but I’m going to try switching one).
polviews age degree
Moderate :713 Min. :18.00 <HS :288
Conserv :292 1st Qu.:33.00 HS :976
SlghtCons:268 Median :47.00 Junior Coll :151
Liberal :244 Mean :48.19 Bachelor deg:354
SlghtLib :208 3rd Qu.:61.00 Graduate deg:205
(Other) :149 Max. :89.00
NA's :100 NA's :5
Visualization 3
data(gss)
ggplot(gss_age_ed,aes(x=age,color=polviews))+
geom_bar()+labs(x="Age", fill ="Political Views")
This visualization is definitely more exact and accurate measurement-wise, but it does not account for the relationship between education and political views (it’s main limitation).
Visualization 4
data(gss)
ggplot(gss_age_ed,aes(x=age,y=degree,color=polviews))+
geom_jitter(width=0.2)+labs(x="Age",y="Highest Degree Awarded")
This scatterplot is, in my opinion, one of the best visualizations for what I’m looking for (the impact of age and education on political views). As can be seen above, the vast majority of respondents have a high school diploma, and based on the point colors, a substantial amount of them have political views ranging from “slightlib” to “slightcons.” However, the main limitation for this graph is that it does not have exact measurements - any calculation related to mean, median, SD, etc. cannot be done visually.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Popiela (2022, April 27). Data Analytics and Computational Social Science: HW #4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela892159/
BibTeX citation
@misc{popiela2022hw, author = {Popiela, Katie}, title = {Data Analytics and Computational Social Science: HW #4}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela892159/}, year = {2022} }