Text as Data Final Project

Blog Post 2
Author

Quinn He

Published

October 19, 2022

Code
library(tidyverse)
library(RedditExtractoR)
library(syuzhet)
library(rvest)
library(quanteda)
library(quanteda.textplots)

knitr::opts_chunk$set(echo = TRUE)

Research Question

Compare how two subreddits (/r/republicans and /r/democrats) discuss particular political issues. With this it may be difficult because the democrat subreddit has a far superior user base.

I think I will have to actually scrape reddit for the comments at least because I cannot find the command to get comments of posts. I am able to get the titles of posts and how many comments they have, but I want to analze the discourse of the comments as well.

Should I do something with /r/conspiracy?

If I choose to look into rhetoric on the ukraine/russian war, I would have to pick different subreddits because there were little to no posts in subreddit titles.

What words do they tend to use over the opposition?

Below I am pulling data from the two subreddits below. As of now, these two subreddits will be my focus for analysis.

Code
red <- find_thread_urls(subreddit = "republicans", sort_by = "top", period = "month")
parsing URLs on page 1...
Warning in file(con, "r"): cannot open URL 'https://www.reddit.com/r/
republicans/top.json?t=month&limit=100': HTTP status was '429 Unknown Error'
Error in value[[3L]](cond): Cannot read from Reddit, check your inputs or internet connection
Code
blue <- find_thread_urls(subreddit = "democrats", sort_by = "top", period = "month")
parsing URLs on page 1...
Warning in file(con, "r"): cannot open URL 'https://www.reddit.com/r/democrats/
top.json?t=month&limit=100': HTTP status was '429 Unknown Error'
Error in value[[3L]](cond): Cannot read from Reddit, check your inputs or internet connection

Word cloud for /r/democrats titles

Code
blue_corpus <- corpus(blue$title)
Error in corpus(blue$title): object 'blue' not found
Code
blue_tokens <- tokens(blue_corpus,
                      remove_punct = T,
                      remove_numbers = T)
Error in tokens(blue_corpus, remove_punct = T, remove_numbers = T): object 'blue_corpus' not found
Code
blue_tokens <- tokens_select(blue_tokens,
                             pattern = stopwords("en"),
                             selection = "remove")
Error in tokens_select(blue_tokens, pattern = stopwords("en"), selection = "remove"): object 'blue_tokens' not found
Code
blue_dfm <- dfm(blue_tokens)%>% 
  dfm_trim(min_termfreq = 3)
Error in dfm(blue_tokens): object 'blue_tokens' not found
Code
textplot_wordcloud(blue_dfm, max_words = 100, color = "blue")
Error in textplot_wordcloud(blue_dfm, max_words = 100, color = "blue"): object 'blue_dfm' not found

Word cloud for /r/republicans titles

Code
red_corpus <- corpus(red$title)
Error in corpus(red$title): object 'red' not found
Code
red_tokens <- tokens(red_corpus,
                      remove_punct = T,
                      remove_numbers = T)
Error in tokens(red_corpus, remove_punct = T, remove_numbers = T): object 'red_corpus' not found
Code
red_tokens <- tokens_select(red_tokens,
                             pattern = stopwords("en"),
                             selection = "remove")
Error in tokens_select(red_tokens, pattern = stopwords("en"), selection = "remove"): object 'red_tokens' not found
Code
red_dfm <- dfm(red_tokens) %>% 
  dfm_trim(min_termfreq = 3)
Error in dfm(red_tokens): object 'red_tokens' not found
Code
textplot_wordcloud(red_dfm, max_words = 100, color = "red")
Error in textplot_wordcloud(red_dfm, max_words = 100, color = "red"): object 'red_dfm' not found

This only gets the titles of the recent posts on the subreddit, but for now I will use it to just run sentiment analysis on that.

Code
red_title_sent <- get_nrc_sentiment(red$title)
Error in get_nrc_sentiment(red$title): object 'red' not found
Code
blue_title_sent <- get_nrc_sentiment(blue$title)
Error in get_nrc_sentiment(blue$title): object 'blue' not found
Code
red_title_sent <- cbind(red_title_sent, red)
Error in cbind(red_title_sent, red): object 'red_title_sent' not found
Code
blue_title_sent <- cbind(blue_title_sent, blue)
Error in cbind(blue_title_sent, blue): object 'blue_title_sent' not found

Notes from other research

Reddit is not a great representation of the general public. It is a niche group, but can have more in depth discussion than Twitter. Reddit users are also, usually, passionate about certain ideas and subjects, therefore many users will talk freely about their ideas.

Previous Research

A Tale of Two Subreddits: https://ojs.aaai.org/index.php/ICWSM/article/view/19347/19119

No echo in the chambers of political interactions on Reddit: https://www.nature.com/articles/s41598-021-81531-x

Determining Presidential Approval Rating Using Reddit Sentiment Analysis: https://towardsdatascience.com/determining-presidential-approval-rating-using-reddit-sentiment-analysis-7912fdb5fcc7

https://www.researchgate.net/publication/349794705_Populist_Supporters_on_Reddit_A_Comparison_of_Content_and_Behavioral_Patterns_Within_Publics_of_Supporters_of_Donald_Trump_and_Hillary_Clinton