
final project

Emma Rasmussen


August 26, 2022


knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Reading In the Data

#creating a vector of new column names
mass_names<- c("incident_id", "incident_date", "state", "city_or_county", "address", "number_killed", "number_injured", "delete")

#creating a function to read in the data sets with new column names, skip the first row, and remove the "operation" column which contains links to news articles in original data source, and creating a "Year" column for ease of analysis
                                                skip=1) %>%
    mutate("YearSheet"=sheet_name) %>% 
    mutate(Year=recode(YearSheet, "MassShootings2014"="2014", "MassShootings2015"="2015", "MassShootings2016"="2016", "MassShootings2017"="2017", "MassShootings2018"="2018", "MassShootings2019"="2019", "MassShootings2020"="2020", "MassShootings2021"="2021", "MassShootings2022"="2022")) %>% 
  select(-delete, -YearSheet)

#using purrr/map_dfr to join data sheets for 2014 through 2021, applying the function read_shootings for consistent formatting
mass_shootings_all <- map_dfr(
Error in `gargle_abort_request_failed()`:
! Client error: (401) UNAUTHENTICATED
• Request not authenticated due to missing, invalid, or expired OAuth token.
• Request had invalid authentication credentials. Expected OAuth 2 access
  token, login cookie or other valid authentication credential. See
#sanity check
Error in eval(expr, envir, enclos): object 'mass_shootings_all' not found

The number of rows in the df is equal to the sum of the rows from the original google sheets data (-9 for column names in google sheets)

#Can now use "year" column to easily analyze data by year
filter(mass_shootings_all, Year=="2014")
Error in filter(mass_shootings_all, Year == "2014"): object 'mass_shootings_all' not found
#Counting number of shootings per year and generating a new table
mass_shootings_all_hist<-mass_shootings_all %>%
    group_by(Year) %>%
    summarise(Count = n())
Error in group_by(., Year): object 'mass_shootings_all' not found
#creating plot of shootings/year
ggplot(mass_shootings_all_hist, aes(x=Year, y=Count))+
  labs(title="Mass Shootings 2014-2022*", caption="*2022 data goes up to August 27, 2022")
Error in ggplot(mass_shootings_all_hist, aes(x = Year, y = Count)): object 'mass_shootings_all_hist' not found
#converting S3: POSIXc to date format
Error in as.Date(mass_shootings_all$incident_date): object 'mass_shootings_all' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all' not found
#creating a month column and converting to factors
mass_shootings_all<-mass_shootings_all %>% 
Error in mutate(., month = as.factor(month(incident_date_new))): object 'mass_shootings_all' not found
#creating a new table with month data
mass_shootings_all_months<-mass_shootings_all %>%
    group_by(month) %>%
    summarise(Count = n())
Error in group_by(., month): object 'mass_shootings_all' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all_months' not found
#creating plot by month
ggplot(mass_shootings_all_months, aes(x=month, y=Count))+geom_bar(stat="identity")+labs(title="Mass Shootings 2014-2022 By Month")
Error in ggplot(mass_shootings_all_months, aes(x = month, y = Count)): object 'mass_shootings_all_months' not found

In addition to mass shootings increasing over time, it appears that shootings could be correlated with temperature/season given the data set when filtered by month is highest in summer months an lowest in winter months.

I am curious if a less seasonally varying state would have the same distrubtion. Below I create the same plot for FL and MA

#Doing above for 1 State
#creating a new table with month data for FL
mass_shootings_all_florida<-filter(mass_shootings_all, state=="Florida")
Error in filter(mass_shootings_all, state == "Florida"): object 'mass_shootings_all' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all_florida' not found
mass_shootings_all_months_FL<-mass_shootings_all_florida %>%
    group_by(month, .drop=FALSE) %>%
    summarise(Count = n())
Error in group_by(., month, .drop = FALSE): object 'mass_shootings_all_florida' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all_months_FL' not found
ggplot(mass_shootings_all_months_FL, aes(x=month, y=Count))+geom_bar(stat="identity")+labs(title="Mass Shootings 2014-2022 By Month in Florida")
Error in ggplot(mass_shootings_all_months_FL, aes(x = month, y = Count)): object 'mass_shootings_all_months_FL' not found
#Doing same as FL for MA
mass_shootings_all_mass<-filter(mass_shootings_all, state=="Massachusetts")
Error in filter(mass_shootings_all, state == "Massachusetts"): object 'mass_shootings_all' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all_mass' not found
mass_shootings_all_months_MA<-mass_shootings_all_mass %>%
    group_by(month, .drop=FALSE) %>%
    summarise(Count = n())
Error in group_by(., month, .drop = FALSE): object 'mass_shootings_all_mass' not found
Error in eval(expr, envir, enclos): object 'mass_shootings_all_months_MA' not found
ggplot(mass_shootings_all_months_MA, aes(x=month, y=Count))+geom_bar(stat="identity")+labs(title="Mass Shootings 2014-2022 By Month in Massachusetts")
Error in ggplot(mass_shootings_all_months_MA, aes(x = month, y = Count)): object 'mass_shootings_all_months_MA' not found

Going forward, I think I will try to create these plots for different states to see if this trend holds true across different states. I am also curious if i can find a dataset with typical temp ranges/state and seeing if there is correlation between temp variation and mass shootings.Am also curious to figure out what kind of distribution best describes the graph with all states.

Whats going on??? rant: There are a number of confounding factors that could explain the apparent correlation with season/temp- people more/less likely to leave the house based on weather, more public gatherings during seasons with higher temps… also wondering if covid affects this. I could create the same graph by state and year (but there probaly isnt enough events to see a correlation, but maybe for a state with a high population?) And wondering if the average number killed also increases with higher temperatures as there may be more opportunities/gatherings of people

n_distinct(mass_shootings_all, "Year")
Error in list2(...): object 'mass_shootings_all' not found