Error in lapply(urls, parse_thread_url): object 'top_repub' not found
Above I am attempting to get addition information on reddit comments (user, date, post responding to, upvotes, downvotes). Below I am just reading them in as rds files like Saaradhaa has done because my other way of getting comments with RedditExtractoR has not worked.
saveRDS(url_content_info, "url_content_info.rds")
Error in saveRDS(url_content_info, "url_content_info.rds"): object 'url_content_info' not found
Error: 'repub_comments_info.csv' does not exist in current working directory ('C:/Users/srika/OneDrive/Desktop/DACSS/Text_as_Data_Fall_2022/posts').
I want to next remove any comments that are [deleted] or [removed] as a user’s comment could have been deleted by OP or removed by a moderator. I still need to remove the auto moderator messages from both subreddits since every post will most likely have an automod comment.
Error in filter(., !(comment %in% c("[removed]", "[deleted]"))): object 'red_comments_info' not found
Yay, finally I have my data! I now have all the comments I wanted to get so far. I’ll still have to perform preprocessing techniques on the data. Now is time for preprocessing techniques. Below I turn the blue comments into a corpus, then tokenize it by removing all the excess junk.
Preprocessing /r/democrats
blue_corpus <-corpus(blue_comments$comment)
Error in corpus(blue_comments$comment): object 'blue_comments' not found
Error in tokens_select(blue_tokens, selection = "remove", pattern = stopwords("en")): object 'blue_tokens' not found
#I remove words that dont have any meaning to me that were in the network cloud.blue_tokens <-tokens_remove(blue_tokens, c("back", "really", "less", "saying", "look", "like", "get", "every", "said", "anything", "s", "right", "now", "see"))
Error in tokens_select(x, ..., selection = "remove"): object 'blue_tokens' not found
Error in textplot_network(small_fcm_blue, min_freq = 0.5, omit_isolated = T): object 'small_fcm_blue' not found
There are still some words I want to get rid of based off the network plot. The words I see on the outside of the network I would expect to be closer to the inside, but this could be because there are some words I just don’t think are relevant.
Preprocessing on /r/republicans corpus
Below I do the same thing I did with the blue comments on the red comments
red_corpus <-corpus(red_comments$comment)
Error in corpus(red_comments$comment): object 'red_comments' not found
Error in tokens_select(red_tokens, selection = "remove", pattern = stopwords("en")): object 'red_tokens' not found
#I remove words that dont have any meaning to me that were in the network cloud.red_tokens <-tokens_remove(red_tokens, c("back", "really", "less", "saying", "look", "like", "get", "every", "said", "anything", "s", "right", "now", "see", "anyone", "one", "say", "take", "much", "last", "never", "changed", "just", "questions", "r", "please", "note"))
Error in tokens_select(x, ..., selection = "remove"): object 'red_tokens' not found
Rebublicans DFM
red_dfm <- red_tokens%>%tokens_tolower() %>%dfm()
Error in tokens_tolower(.): object 'red_tokens' not found
dfm_trim(red_dfm, min_termfreq =3)
Error in dfm_trim(red_dfm, min_termfreq = 3): object 'red_dfm' not found
red_fcm <-fcm(red_dfm)
Error in fcm(red_dfm): object 'red_dfm' not found
This is just a simple wordcloud to visually get a gist of some of the most popular words in the subreddit.
textplot_wordcloud(red_dfm, min_count =10, max_words =100, color ="red")
Error in textplot_wordcloud(red_dfm, min_count = 10, max_words = 100, : object 'red_dfm' not found
Again, lets see the top terms in the republican subreddit dfm. I’m unsure of what “t” is, but some stemming may take care of that, or it could have some significant meaning within the subreddit (an inside joke perhaps).
topfeatures(red_dfm, 20)
Error in topfeatures(red_dfm, 20): object 'red_dfm' not found
Error in textplot_network(small_fcm_red, min_freq = 0.5, omit_isolated = T, : object 'small_fcm_red' not found
This network plot seems closer to what I am looking for with the /r/democrats network plot. In both network plots, “people” is at the center of the network. The only problem is I don’t know how they are using the word and in reference to what. I can solve this with a kwic function using “people” as a keyword.
Dictionary Methods
I want to use wordgraphs in the next blog post or final project.
Error in liwcalike(blue_corpus, data_dictionary_NRC): object 'blue_corpus' not found
ggplot(blue_nrc_sentiment)+geom_histogram(aes(positive), fill ="blue")
Error in ggplot(blue_nrc_sentiment): object 'blue_nrc_sentiment' not found
The graphs are very similar in structure, but the democrats subreddit has far more positive posts than the republicans subreddit, by an extreme margin.
Error in ggplot(red_geninq_sentiment): object 'red_geninq_sentiment' not found
It appears these two dictionaries in particular are quite similar, I’ll have to check this with maybe a third dictionary. The only difference is NRC polarity has a higher count.
Error in ggplot(red_loughran_mcdonald): object 'red_loughran_mcdonald' not found
Dictionary Moral Foundations Dictionary
I’ll have to find a way to measure or graph these to compare the subreddits from a holistic view. Otherwise, I could find a way to join the data frames together, but I do not think that would benefit me.
liwcalike(blue_corpus, data_dictionary_MFD)
Error in liwcalike(blue_corpus, data_dictionary_MFD): object 'blue_corpus' not found
Error in ggplot(blue_df_nrc): object 'blue_df_nrc' not found
So I feel like I did something wrong because the graph is completely symmetrical.
Creating my own dictionary
dictionary()
Error in file.exists(file): invalid 'file' argument
Keywords in Context
Here I will fill in one of the top words once the code loads because I want to see how exactly some of these top words are used with the kwic function. I like using this function because I can pick specific words I want to look at in context of a larger sentence. Just by a glance, in the blue_corpus, people use “they” in reference to talking about the President. For example, “they think that biden…” or “they knew biden…”. In both subreddits, you will get negative sentiment towards the President because people want to express their grievances, but do republicans tend to talk more negatively about him? I’ll check a few other keywords as well to look at discourse at a glance for terms like “ukraine”, “midterm”, and “Walker”.
kwic_blue_biden <-kwic(blue_corpus, "biden")
Error in kwic(blue_corpus, "biden"): object 'blue_corpus' not found
kwic_red_biden <-kwic(red_corpus, "biden")
Error in kwic(red_corpus, "biden"): object 'red_corpus' not found
kwic_blue_ukraine <-kwic(blue_corpus, "Ukraine")
Error in kwic(blue_corpus, "Ukraine"): object 'blue_corpus' not found
kwic_red_ukraine <-kwic(red_corpus, "Ukraine")
Error in kwic(red_corpus, "Ukraine"): object 'red_corpus' not found
kwic_blue_midterm <-kwic(blue_corpus, "midterm")
Error in kwic(blue_corpus, "midterm"): object 'blue_corpus' not found
kwic_red_midterm <-kwic(red_corpus, "midterm")
Error in kwic(red_corpus, "midterm"): object 'red_corpus' not found
kwic_blue_walker <-kwic(blue_corpus, "Walker")
Error in kwic(blue_corpus, "Walker"): object 'blue_corpus' not found
kwic_red_walker <-kwic(red_corpus, "Walker")
Error in kwic(red_corpus, "Walker"): object 'red_corpus' not found
LDA Models for /r/democrats and /r/republicans
library(seededlda)
Error in library(seededlda): there is no package called 'seededlda'
dem_comments_lda <-textmodel_lda(blue_dfm, k =10)
Error in textmodel_lda(blue_dfm, k = 10): could not find function "textmodel_lda"
dem_terms <-terms(dem_comments_lda, 10)
Error in terms(dem_comments_lda, 10): object 'dem_comments_lda' not found
dem_terms
Error in eval(expr, envir, enclos): object 'dem_terms' not found
gop_comments_lda <-textmodel_lda(red_dfm, k =10)
Error in textmodel_lda(red_dfm, k = 10): could not find function "textmodel_lda"
gop_terms <-terms(gop_comments_lda, 10)
Error in terms(gop_comments_lda, 10): object 'gop_comments_lda' not found
gop_terms
Error in eval(expr, envir, enclos): object 'gop_terms' not found
I think I’ll want to do LDA modelling based on what we learned in Tutorial 10 in my final project or future blog posts because the tutorial seemed more comprehensive and I noticed words were a bit more similar when grouped when the lambda was changed to various numbers between 0.2 and 0.4.
textplot_keyness(textstat_keyness(blue_dfm))
Error in textstat_keyness(blue_dfm): object 'blue_dfm' not found
Writing csv for data
write_csv(blue_comments, "blue_comments1")
Error in is.data.frame(x): object 'blue_comments' not found
write_csv(red_comments, "red_comments1")
Error in is.data.frame(x): object 'red_comments' not found