CaitlinRowley - Blog Post 3

In this blog post, I am continuing run analyses on my selected data set. I will specifically focus on pre-processing and data visualization.

Author

Caitlin Rowley

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

# install packages

install.packages("cleanNLP")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("tidytext")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("tidyverse")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("quanteda")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("pdftools")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("wordcloud")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("RColorBrewer")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("tm")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("slam")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

install.packages("NLP")

Installing package into 'C:/Users/caitr/AppData/Local/R/win-library/4.2'
(as 'lib' is unspecified)

Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

# load libraries

library(cleanNLP)
library(tidytext)
library(tidyverse)

── Attaching packages
───────────────────────────────────────
tidyverse 1.3.2 ──

✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

library(quanteda)

Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"unpackedMatrix" of class "mMatrix"; definition not updated

Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"unpackedMatrix" of class "replValueSp"; definition not updated

Package version: 3.2.3
Unicode version: 13.0
ICU version: 69.1
Parallel computing: 4 of 4 threads used.
See https://quanteda.io for tutorials and examples.

library(pdftools)

Using poppler version 22.04.0

library(stringr)
library(quanteda.textplots)
library(XML)
library(wordcloud)

Loading required package: RColorBrewer

library(tm)

Loading required package: NLP

Attaching package: 'NLP'

The following objects are masked from 'package:quanteda':

    meta, meta<-

The following object is masked from 'package:ggplot2':

    annotate


Attaching package: 'tm'

The following object is masked from 'package:quanteda':

    stopwords

library(slam)
library(NLP)

# initialize the NLP backend

cnlp_init_udpipe(
  model_name = NULL, 
  model_path = NULL, 
  tokenizer = "tokenizer",
  tagger = "default",
  parser = "default")

# I am seeking some guidance regarding my character vectors, but in the interim, I will do some pre-processing, generate co-occurence matrices, and try some data visualization. First, I will work on transforming all letters to lowercase, removing punctuation, and removing stop words. I do not want to remove numbers, because I anticipate years being an important component of this text.

Dobbs_token <- tokens(Dobbs, "word", remove_symbols = FALSE, remove_numbers = FALSE, remove_url = FALSE, remove_separators = TRUE, split_hyphens = FALSE, include_docvars = TRUE, padding = FALSE, verbose = quanteda_options("verbose"))

Error in tokens(Dobbs, "word", remove_symbols = FALSE, remove_numbers = FALSE, : object 'Dobbs' not found

  tokens_tolower(Dobbs_token)

Error in tokens_tolower(Dobbs_token): object 'Dobbs_token' not found

  remove_punct = T
  tokens_select(Dobbs_token, pattern = stopwords("en"), selection = "remove")

Error in tokens_select(Dobbs_token, pattern = stopwords("en"), selection = "remove"): object 'Dobbs_token' not found

# I am not yet convinced that I will need to do any stemming or lemmatization, as I will be focusing primarily on nouns, but I will reconsider this as I move forward. I will next try some data visualization:

wordcloud(Dodds, min_count = 5, max_words = 50, random_order = FALSE)

Error in wordcloud(Dodds, min_count = 5, max_words = 50, random_order = FALSE): object 'Dodds' not found

# I keep receiving the following errors: ‘loading required package: RColorBrewer’ and ‘loading required namespace: tm’. It is my understanding that textplot_wordcloud only works on dfm. So, I will need to conduct some additional research on forms of data visualization for text files if ‘wordcloud’ continues to not work.