Homework 3

DACSS-601

Katie Popiela
3/26/2022

1. Identify the dataset you will be using for the final project. The dataset may come from the course datasets, or it may be a dataset you find on your own. Identify the variables in the dataset, including what type of data each variable is.

2. Read in and clean the dataset using packages like readr/readxl/dplyr/tidyr

I will be working with the gss dataset that’s built into the poliscidata package. There are over 200 variables in this dataset, so I will obviously not be listing out all of those. I will be working with 4 specific variables, however, so I will identify and explain each one.

library(poliscidata)
dim(gss)
[1] 1974  221
library(dplyr)
library(poliscidata)
data(gss)
gss %>%
  select(polviews,age,sex,degree)%>%
  head(25) %>%
  tibble()
# A tibble: 25 x 4
   polviews    age sex    degree      
   <fct>     <dbl> <fct>  <fct>       
 1 Moderate     22 Male   Bachelor deg
 2 SlghtCons    21 Male   HS          
 3 SlghtCons    42 Male   HS          
 4 SlghtCons    49 Female HS          
 5 Liberal      70 Female Bachelor deg
 6 Moderate     50 Female Bachelor deg
 7 Moderate     35 Female Junior Coll 
 8 Moderate     24 Female <HS         
 9 Conserv      28 Female <HS         
10 Liberal      28 Female Bachelor deg
# ... with 15 more rows

Above is a very general presentation of the four variables I chose: polviews, age, sex, and degree.
1. polviews - this variable refers to respondents’ political views, ranging from liberal to conservative with “in-between” and extreme categories such as “SlightCons”, “SlightLib”, and “ExtremeCons.”
2. age - this one’s pretty self-explanatory; it’s just the respondents’ ages
3. sex - similar thing here, though this dataset only used the heteronormative gender binary
4. degree - this variable refers to the level of education individual respondents’ have received (less than a high school degree, high school degree, bachelors, graduate, etc.)

3. Identify potential research questions that your dataset can help answer.

I’m not fully decided on a research question yet, but here are a few that this dataset can answer:
1. Does age and degree-level achieved have an impact on respondents’ political views? (i.e. are older individuals [>50] more likely to have conservative or liberal political values?})
2. Do individuals under age 45 have more liberal-leaning political values? If so, what is the average degree level such respondents’ have (or the degree level most respondents’ have)?

Note: This data set also includes variables relating to abortion and court decisions (is the judicial system too harsh/not harsh enough) so I might add one of those and incorporate age, degree, and political views into a research question about a possible correlation between the aforementioned 3 variables with the other 2.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Popiela (2022, March 27). Data Analytics and Computational Social Science: Homework 3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela882646/

BibTeX citation

@misc{popiela2022homework,
  author = {Popiela, Katie},
  title = {Data Analytics and Computational Social Science: Homework 3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomkpopiela882646/},
  year = {2022}
}