Homework 2

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.8     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
library(poliscidata)
Error in library(poliscidata): there is no package called 'poliscidata'
#For this assignment I'm going to be reading in and tidying the 'gss' data set from the R package 'poliscidata'. I am aiming to see if there is a correlation between 2 variables in this data set; I tested 'tolerance4' first and just decided to pick 'another variable, 'reliten' (record of the strength of respondents' religious affiliation) and see if there's a correlation or anything between the two.

data("gss")
Warning in data("gss"): data set 'gss' not found
tolerance4religion <- gss %>%
  select(tolerance4,reliten) %>%
  tibble()
Error in select(., tolerance4, reliten): object 'gss' not found
#There's obviously a lot of information here (the tibble has 100 pages), but it's all relevant to what I'm trying to accomplish so I'm going to focus on visuals. Before I do that, however, I must present my hypothesis, or at the very least, what I wouldn't be surprised to find in the data.

#*NOTE: there are a lot of NA entries so I will be filtering those out*

#I wouldn't be surprised if there was a correlation between the strength of one's religious affiliation and their level tolerance, but I also wouldn't be surprised if there wasn't. I know that religious people have the (very public) reputation of being intolerant to minorities/LGBTQ+ people/pretty much anyone they see as "different." But I also know quite a few people who actively practice religion who are some of the most tolerant people I've ever met.
ggplot(tolerance4religion, mapping = aes(x= tolerance4, y=reliten)) +geom_jitter()+labs(x="Social Tolerance",y="Strength of Religious Affiliation") 
Error in ggplot(tolerance4religion, mapping = aes(x = tolerance4, y = reliten)): object 'tolerance4religion' not found
#Off the bat, you can tell that the "NA" category for Social Tolerance is the most densely packed with points. Since neither "NA" category (row or column) are really relevant to what I'm trying to find, I'm pretty much going to ignore them from this point forward. I'm also going to attempt to remove that row and column altogether.

#As I stated earlier, I was expecting some kind of relationship between the strength of one's religious affiliation and the extent of their social tolerance. I also said that given the very public nature of many of the "religious person is intolerant" incidents, I wouldn't be surprised if the relationship was actually the opposite: a relationship between strong religious affiliation and high social tolerance. 
tolerance4religion <- tolerance4religion %>%
  na.omit()
Error in na.omit(.): object 'tolerance4religion' not found
#The NA data has been removed, so I will code the graph again. (It worked!!)

ggplot(tolerance4religion, mapping = aes(x= tolerance4, y=reliten)) +geom_jitter()+labs(x="Social Tolerance",y="Strength of Religious Affiliation") 
Error in ggplot(tolerance4religion, mapping = aes(x = tolerance4, y = reliten)): object 'tolerance4religion' not found
#So looking at the (relevant) data visualized in this way, you can see that there's a pretty even distribution of respondents who have "not very strong" and "strong" religious affiliations in the low-medhigh categories in Social Tolerance. The "somewhat strong" religious affiliation category, however, doesn't appear to have any kind of relationship with Social Tolerance; the first 3 categories on the x-axis have about the same number of responses. And the "no religion" category, interestingly enough, does appear to have a relationship with having a higher level of social tolerance. 

#I'd like to explore this topic further. I might use a different variable for the y-axis or I might make a second comparison and analyze both.