Final Project

final-project

snapchat-political-ads

ggplot

dplyr

stringr

lubridate

janitor

Analyzing Snapchat Political Ads in the US in 2020

Author

Ananya Pujary

Published

September 4, 2022

Loading the Packages

Code

library(tidyverse)
library(googlesheets4)
library(skimr)
library(dplyr)
library(stringr)
library(lubridate)
library(purrr)
if(!require(corrr))
  install.packages("corrr",repos = "https://cran.us.r-project.org")
if(!require(janitor))
  install.packages("janitor",repos = "https://cran.us.r-project.org")
if(!require(usmap))
  install.packages("usmap",repos = "https://cran.us.r-project.org")
if(!require(viridis))
  install.packages("viridis",repos = "https://cran.us.r-project.org")
if(!require(transformr))
  install.packages("transformr",repos = "https://cran.us.r-project.org")
if(!require(patchwork))
  install.packages("patchwork",repos = "https://cran.us.r-project.org")
if(!require(kableExtra))
  install.packages("kableExtra",repos = "https://cran.us.r-project.org")

knitr::opts_chunk$set(echo = TRUE)

Introduction

Every election season, millions of dollars are spent on political advertisements that help candidates reach a wider audience of potential voters and influence the voting process (Nott, 2020). Political advertisements can be defined as those that “describe a political leader, organization, or party, a public office candidate, or an election/referendum” (Tomasi, 2021). These advertisements can also be created by entities other than the candidates themselves.

Now, with the proliferation of social media in almost every aspect of our lives, they are also playing their part in influencing the political process. Unlike traditional media like newspapers and television, social media platforms are not liable for what is displayed on them and can set their own content regulations (Nott, 2020). Political advertisements on social media are becoming popular because they allow for a ‘micro-targeting’ of demographics and allow candidates to understand and reach the masses better, in turn increasing voter engagement (Nott, 2020). Micro-targeting refers to a marketing strategy that employs consumer demographics and data to generate audience segments (“What is Micro-Targeting & How Does it Affect Advertising”, n.d.).

While Facebook and Google have long been the dominating players in digital political advertising, Snapchat is becoming increasingly popular. In 2020, Snapchat had around 249 million active users on its platform, most of them in the age range of 13-29 (Rodriguez, 2020; “Snapchat statistics 2020”, 2020). Snapchat has made data about the political ads shown on their app public, so this project will use their data for the year 2020 (Snap Inc., n.d.). In particular, I’ll be looking at advertisements shown in the United States for the candidates Joe Biden and Donald Trump. I chose these candidates because the presidential elections were held this year (November 3rd, 2020) and they closely contested against each other. Using this dataset, I plan on looking at relative advertisement expenditure, impressions, and location micro-targeting, and explore the following broad questions:

Project Questions

Is there a relationship between political ad expenditure and impressions?
Which candidate’s advertisements received more impressions?
How much was spent on these advertisements and which candidate spent more on average?
Which states were targeted by these advertisements, and which states did each candidate target more?

Reading in the Data

Code

polads_orig <- read_sheet('https://docs.google.com/spreadsheets/d/1S7jF0D2o8aC3gGndORVrksuSsvMMZwqVdKLmu4SYqUc/edit?usp=sharing')

Error in `gs4_auth()`:
! Can't get Google credentials.
ℹ Are you running googlesheets4 in a non-interactive session? Consider:
• Call `gs4_deauth()` to prevent the attempt to get credentials.
• Call `gs4_auth()` directly with all necessary specifics.
ℹ See gargle's "Non-interactive auth" vignette for more details:
ℹ <]8;;https://gargle.r-lib.org/articles/non-interactive-auth.htmlhttps://gargle.r-lib.org/articles/non-interactive-auth.html]8;;>

Looking at the dataset’s various characteristics:

Code

skim(polads_orig)

Error in skim(polads_orig): object 'polads_orig' not found

Code

print(summarytools::dfSummary(polads_orig, varnumbers = FALSE, plain.ascii = FALSE, graph.magnif = 0.50, style = "grid", valid.col = FALSE), 
      method = 'render', table.classes = 'table-condensed')

Error in summarytools::dfSummary(polads_orig, varnumbers = FALSE, plain.ascii = FALSE, : object 'polads_orig' not found

This file contains the information for political ads that are/have been displayed on Snapchat’s platform, such as the amount spent on them, the organization and advertisers behind them, the candidates/causes the ads support, demographic and location-based ad targeting, and so on. It has 12705 rows and 38 columns. There are 28 character-type, 1 list-type, 7 logical-type, and 2 numeric-type columns.

Tidying the Data

Removing columns that only have missing values:

Code

polads <- polads_orig %>% 
  remove_empty()

value for "which" not specified, defaulting to c("rows", "cols")

Error in is.data.frame(x): object 'polads_orig' not found

Snake case is typically recommended by tidyverse’s style guide for column names and object names. However the column names in this dataset are written either in title case (e.g. Currency Code) or camel case (e.g. OrganizationName). Some of them also contain special characters like brackets which could interfere with implementing R functions.

Note

Snake case refers to the writing style that replaces spaces between words with an underscore (_) and all of the letters in a word are lowercase. On the other hand, title case is the writing style in which the first letter of each word is capitalized and there are spaces between each word. A third type is the camel case, wherein phrases are written out without punctuation or spaces, and words are usually distinguished with the second word’s first letter capitalized.

Hence, using the clean_names() function from the ‘janitor’ package to convert all the column names accordingly.

Code

polads <-clean_names(polads)

Error in clean_names(polads): object 'polads' not found

Code

colnames(polads)

Error in is.data.frame(x): object 'polads' not found

Narrowing down the data

Let’s look at the distribution of countries receiving political advertisements on Snapchat for this year.

Code

table(polads$country_code) %>%
  knitr::kable(caption = "Countries Receiving Snapchat Political Ads (2020)",col.names = c("Country","Frequency")) %>%
  kable_minimal()

Error in table(polads$country_code): object 'polads' not found

Most of the political advertisements were delivered to places in the United States (11124). Only keeping rows that describe ads targeting the United States:

Code

polads <- filter(polads,country_code =="united states")

Error in filter(polads, country_code == "united states"): object 'polads' not found

Code

polads

Error in eval(expr, envir, enclos): object 'polads' not found

Verifying whether all ads targeted for the United States were paid for in USD:

Code

table(select(polads,currency_code)) %>%
  knitr::kable(caption = "Currency Code Frequency (2020)",col.names = c("Currency Code","Frequency")) %>%
  kable_minimal()

Error in select(polads, currency_code): object 'polads' not found

Three ads were paid for in Canadian Dollars (CAD). Finding out more about these three rows:

Code

polads %>%
  filter(currency_code=="CAD")

Error in filter(., currency_code == "CAD"): object 'polads' not found

Two ads were paid for by the University of British Columbia and were targeted at people aged 16-25 in the following areas: San Francisco, Oakland, and San Jose. The third was paid for by Point Digital Creative Studio. None of them provided information about the candidate associated with the ad.

Since we require information about the candidate in order to analyze relative ad spending, targeting and impressions, tidying up the candidate_ballot_information column by removing missing values:

Code

sum(is.na(polads$candidate_ballot_information))

Error in eval(expr, envir, enclos): object 'polads' not found

Code

# removing rows with missing values
polads <- drop_na(polads,candidate_ballot_information)

Error in drop_na(polads, candidate_ballot_information): object 'polads' not found

Code

polads

Error in eval(expr, envir, enclos): object 'polads' not found

There were 5312 rows missing candidate information. Next, I’m only including those rows that explicitly states the candidate name (containing the words “Biden” and “Trump”).

Code

polads <- filter(polads, str_detect(candidate_ballot_information, 'Biden|Trump'))

Error in filter(polads, str_detect(candidate_ballot_information, "Biden|Trump")): object 'polads' not found

Code

# sanity check
table(select(polads,candidate_ballot_information)) %>%
  knitr::kable(caption = "Frequency of Candidates", col.names = c("Candidate","Frequency"))

Error in select(polads, candidate_ballot_information): object 'polads' not found

There are two entries that contain the string “Trump” but are in fact campaigning against him (“Against Trump”, “Operation Dump Trump”, “Titere de Trump”). There’s also an entry called “Biden vs Trump” which doesn’t clearly indicate which party the ad will be supporting. Removing these rows so that they don’t skew the results:

Code

polads <- polads %>%
  filter(!(candidate_ballot_information=="Against Trump"| candidate_ballot_information=="Operation Dump Trump"| candidate_ballot_information=="Biden vs Trump"| candidate_ballot_information=="Titere de Trump"))

Error in filter(., !(candidate_ballot_information == "Against Trump" | : object 'polads' not found

Since missing values in certain columns indicate that either all or none of the categories in the column were targeted, I’m changing their missing values accordingly for easy analysis.

Code

polads <- polads %>%
  replace_na(list(gender = "ALL",os_type = "ALL",language = "none",advanced_demographics = "None",targeting_connection_type = "None",targeting_carrier_isp = "ALL"))

Error in replace_na(., list(gender = "ALL", os_type = "ALL", language = "none", : object 'polads' not found

Code

# sanity check
table(select(polads,gender))

Error in select(polads, gender): object 'polads' not found

Code

table(select(polads,os_type))

Error in select(polads, os_type): object 'polads' not found

Code

table(select(polads,language))

Error in select(polads, language): object 'polads' not found

Code

table(select(polads,advanced_demographics))

Error in select(polads, advanced_demographics): object 'polads' not found

Code

table(select(polads,targeting_connection_type))

Error in select(polads, targeting_connection_type): object 'polads' not found

Code

table(select(polads,targeting_carrier_isp))

Error in select(polads, targeting_carrier_isp): object 'polads' not found

The case of `age_bracket` and `advanced_demographics`

The age_bracket column’s values are as follows:

Code

table(select(polads,age_bracket)) %>%
    knitr::kable(caption = "Age Targeting by Snapchat Political Ads (2020)",col.names = c("Ages","Frequency")) %>%
  kable_minimal()

Error in select(polads, age_bracket): object 'polads' not found

Clearly, the column’s values overlap and tend to refer to similar age groups, for instance, 18-20, 18-24, and 18+.

As for advanced_demographics:

Code

table(select(polads,advanced_demographics))%>%
  knitr::kable(caption = "Advanced Demographics Targeting by Snapchat Political Ads (2020)",col.names = c("Advanced Demographics","Frequency")) %>%
  kable_minimal()

Error in select(polads, advanced_demographics): object 'polads' not found

Clearly, very few ads provided additional demographic information for ad targetting and the data aren’t uniform (i.e. there are details on people’s household incomes, occupations, languages spoken, educational levels, number of children etc.), so I wouldn’t be able to effectively analyze it in relation to other columns. Though I was looking forward to analyzing these columns, the data they had were too sparse to work with.

Wrangling with the date columns

The “Z” at the end of the date-timestamp indicates that the timezone chosen is UTC, but I won’t be requiring it for analysis, so I’ll remove it. Also, I’m arranging the rows by the start date set for the advertisement and converting the data types of the date columns (start_date and end_date) from character to date-time.

Code

polads <- polads %>%
  arrange(ymd_hms(polads$start_date))

Error in arrange(., ymd_hms(polads$start_date)): object 'polads' not found

Code

#sanity check

head(polads)

Error in head(polads): object 'polads' not found

Code

# converting data types of date columns from character to datetime

polads <- polads %>%
  mutate(start_date = ymd_hms(start_date)) %>%
  mutate(end_date = ymd_hms(end_date))

Error in mutate(., start_date = ymd_hms(start_date)): object 'polads' not found

Code

# rechecking class of these columns
class(polads$start_date)

Error in eval(expr, envir, enclos): object 'polads' not found

Code

class(polads$end_date)

Error in eval(expr, envir, enclos): object 'polads' not found

Code

head(polads)

Error in head(polads): object 'polads' not found

Next, I want to create a new column that gives the duration for which the ad was run on Snapchat. I chose to display this information in hours.

Code

polads <- polads %>%
  mutate(ad_duration = difftime(end_date,start_date,units= c("hours")))

Error in mutate(., ad_duration = difftime(end_date, start_date, units = c("hours"))): object 'polads' not found

Code

unique(polads$ad_duration)

Error in unique(polads$ad_duration): object 'polads' not found

Missing values show up for those rows without an end date for the advertisement. Plotting the distribution of political advertisement duration:

Code

ggplot(polads, aes(x=as.numeric(ad_duration))) + geom_histogram(binwidth=15) + labs(title = "Distribution of Snapchat Political Ad Duration (2020)",x = "Duration in Hours", y = "Frequency", caption = "Note: This plot does not include ads that did not specify an end date") + theme_minimal()

Error in ggplot(polads, aes(x = as.numeric(ad_duration))): object 'polads' not found

A large proportion of the ads ran for less that 250 hours.

Lastly, I’ll be changing the entries candidate_ballot_information to either “Biden” or “Trump” to make it more uniform and for ease of analysis. For instance, “Joe Biden for President” will be changed to “Biden”.

Code

# changing `candidate_ballot_information` to either "Biden" or "Trump"
polads <- polads %>%
  mutate(candidate_ballot_information = case_when(
    str_detect(candidate_ballot_information, "Biden") ~ "Biden",
    str_detect(candidate_ballot_information, "Trump")  ~ "Trump",
    TRUE ~ candidate_ballot_information))

Error in mutate(., candidate_ballot_information = case_when(str_detect(candidate_ballot_information, : object 'polads' not found

Code

# sanity check
polads %>%
  filter(str_detect(candidate_ballot_information, "Trump")) %>%
  tally() %>%
  knitr::kable(col.names = "Number of Trump ads")

Error in filter(., str_detect(candidate_ballot_information, "Trump")): object 'polads' not found

Code

polads %>%
  filter(str_detect(candidate_ballot_information, "Biden")) %>%
  tally() %>%
  knitr::kable(col.names = "Number of Biden ads")

Error in filter(., str_detect(candidate_ballot_information, "Biden")): object 'polads' not found

There are 1251 political advertisements supporting Biden’s campaign and 483 political advertisements for Trump’s campaign.

Analyzing and Visualizing the Data

Ad Expenditure and Impression Analysis

I want to determine whether there’s a correlation between two variables I’m interested in: spend and impressions.

Code

polads_cor <- polads %>% 
  select(spend,impressions) %>% 
  correlate()

Error in select(., spend, impressions): object 'polads' not found

Code

polads_cor_plot <- rplot(polads_cor) + labs(title = "Correlation between Ad Expenditure and Impressions Received\n")

Error in rplot(polads_cor): object 'polads_cor' not found

Code

polads_cor_plot

Error in eval(expr, envir, enclos): object 'polads_cor_plot' not found

This plot shows us that these two variables are moderately correlated with each other.

Code

# total amount spent by both candidates' ads
polads %>%
  select(candidate_ballot_information, spend)%>%
  group_by(candidate_ballot_information)%>%
  summarize(spend_sum = sum(spend)) %>%
  knitr::kable(caption = "Total Snapchat Political Ad Expenditure (2020)", col.names = c("Candidate","Total Amount in USD")) %>%
  kable_minimal()

Error in select(., candidate_ballot_information, spend): object 'polads' not found

More funds were allocated to political ads supporting Biden on Snapchat ($4,367,549) than Trump ($613,733).

Code

# total impressions received by both candidates' ads
polads %>%
  select(candidate_ballot_information, impressions)%>%
  group_by(candidate_ballot_information)%>%
  summarize(impressions_sum = sum(impressions)) %>%
  knitr::kable(caption = "Total Snapchat Political Ad Impressions (2020)", col.names = c("Candidate","Total Impressions")) %>%
  kable_minimal()

Error in select(., candidate_ballot_information, impressions): object 'polads' not found

Ads supporting Biden received more impressions (804,943,566) than those supporting Trump (378,452,979).

Code

polads %>%
  select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
  group_by(candidate_ballot_information) %>% 
  slice(which.max(spend)) %>%
  knitr::kable(caption = "Highest Singular Ad Expenditure by Candidate", col.names = c("Candidate","Expenditure","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
  kable_minimal()

Error in select(., candidate_ballot_information, spend, organization_name, : object 'polads' not found

The highest funds allocated for a single political advertisement supporting Biden (and overall) was $151,724, while the $33,349 spent by Albbiom Marketing LLC was the most expensive political advertisement for Trump’s campaign. The Biden advertisement was displayed almost all day on election day (11/03/2020) as indicated by its start and end date. Even though more funds was spent on the Biden advertisement, Trump’s advertisement had more impressions (30,383,613).

Code

polads %>%
  select(candidate_ballot_information,spend,organization_name,paying_advertiser_name,impressions,start_date,end_date) %>%
  group_by(candidate_ballot_information) %>% 
  slice(which.max(impressions)) %>%
  knitr::kable(caption = "Highest Singular Ad Impressions by Candidate (2020)", col.names = c("Candidate","Spend","Organization Name","Paying Advertiser Name","Impressions","Start Date","End Date")) %>%
  kable_minimal()

Error in select(., candidate_ballot_information, spend, organization_name, : object 'polads' not found

The highest number of impressions received for a singular ad supporting Biden was 17,927,667, while for Trump it was 31,848,256. The higher number of impressions for Trump’s ad could be attributed to it not having a set end date.

I’m interested in knowing the relative expenditure and impressions for advertisements by candidate as well. First, I want to extract the month from the start_date and end_date columns and use it to determine spending over the months.

Code

polads <- polads %>%
  mutate(start_month = month(start_date,label = TRUE),end_month = month(end_date, label = TRUE))

Error in mutate(., start_month = month(start_date, label = TRUE), end_month = month(end_date, : object 'polads' not found

Code

# sanity check
str(polads$start_month)

Error in str(polads$start_month): object 'polads' not found

Code

str(polads$end_month)

Error in str(polads$end_month): object 'polads' not found

I’ll need to take a log transformation because the values in the spend column are skewed. I’m using a smooth plot to track expenditure and impressions over the months.

Code

exp_by_month_plot <- polads %>%
  ggplot(aes(x=start_date, y=log(spend), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Snapchat Political Ad Expenditure per Month by Candidate (2020)", x = "Month", y = "Expenditure", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()

Error in ggplot(., aes(x = start_date, y = log(spend), group = candidate_ballot_information, : object 'polads' not found

Code

exp_by_month_plot

Error in eval(expr, envir, enclos): object 'exp_by_month_plot' not found

More funds were spent on political ads supporting Biden’s campaign in the months leading up to election day, i.e. July to November. Ads for Trump’s campaign received more funds in the first half of the year. It would be worthwhile to compare the impressions of advertisements for both candidates too:

Code

imp_by_month_plot <-polads %>%
  ggplot(aes(x=start_date, y=log(impressions), group=candidate_ballot_information, color=candidate_ballot_information)) + geom_smooth() + labs(title = "Political Ad Impressions per Month by Candidate (2020)", x = "Month", y = "Impressions", colour = "Candidate") + scale_color_brewer(palette = "Set2") + theme_minimal()

Error in ggplot(., aes(x = start_date, y = log(impressions), group = candidate_ballot_information, : object 'polads' not found

Code

imp_by_month_plot

Error in eval(expr, envir, enclos): object 'imp_by_month_plot' not found

Advertisements supporting Trump’s campaign seem to have reached more people than Biden’s advertisements in the first half of the year. However, as noted before, impressions reached for advertisements for Biden’s campaign were more prominent in the later months of the year.

I want to know which ads had the longest and shortest duration by candidate, to see whether impressions vary greatly:

Code

polads %>%
  select(candidate_ballot_information,organization_name,paying_advertiser_name,start_date,end_date,ad_duration,impressions)%>%
  group_by(candidate_ballot_information)%>%
  slice(which.max(ad_duration),which.min(ad_duration)) %>%
  knitr::kable(caption = "Longest and Shortest Snapchat Political Ads by Candidate (2020)", col.names = c("Candidate","Organization Name","Paying Advertiser Name","Start Date","End Date","Ad Duration","Impressions"))%>%
  kable_minimal()

Error in select(., candidate_ballot_information, organization_name, paying_advertiser_name, : object 'polads' not found

The longest duration of an ad supporting Biden was more than 835 hours long and ran till election day. It’s interesting that the ad with the shortest duration (29 hours) supporting this candidate received way more impressions than the longer one. This could be because the shorter ad was run on election day. On the other hand, the longest duration for Trump’s ads was more than 1860 hours long, also running till the end of election day. The shortest ad (6.6 hours) for this candidate was displayed in June and received lesser impressions too.The paying advertiser’s names indicate that these ads were probably issued directly from the respective candidates’ campaigns and not by an outside entity (except for the shortest ad supporting Trump).

Location Targeting Analysis

Wrangling with the location columns

The following columns indicate different types of information about the locations targeted by the advertisements: regions_included, regions_excluded, electoral_districts_included, radius_targeting_included, radius_targeting_excluded, metros_included, metros_excluded, postal_codes_included, postal_codes_excluded. Most of these columns do not have enough values to be effectively analyzed, and due to a lack of time, the list column postal_codes_included could not be included in my analysis.

I’ll be using the regions_included and regions_excluded columns. They have multiple states in each row which need to be separated into different rows:

Code

# regions_included
polads <- polads %>%
  separate_rows(regions_included, sep = ",")

Error in separate_rows(., regions_included, sep = ","): object 'polads' not found

Code

# sanity check
unique(polads$regions_included)

Error in unique(polads$regions_included): object 'polads' not found

Code

# `regions_excluded`
polads <- polads %>%
  separate_rows(regions_excluded, sep = ",")

Error in separate_rows(., regions_excluded, sep = ","): object 'polads' not found

Code

# sanity check
unique(polads$regions_excluded)

Error in unique(polads$regions_excluded): object 'polads' not found

The states of Alaska, Hawaii, and California were excluded from being shown certain political advertisements of the candidates. This could be either due to the stringent laws these states have for reporting campaign contributions and expenditure activities or historic voting patterns (“Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska”, n.d.;“Contribution Limits”, n.d.;Electronic Media Advertisements, 2020).

Checking whether information on organization_name and paying_advertiser_name is available for those advertisements excluding these states:

Code

polads %>%
  select(organization_name,paying_advertiser_name,spend,candidate_ballot_information,regions_excluded) %>%
  filter(str_detect(regions_excluded, 'California|Hawaii|Alaska')) %>%
  distinct()

Error in select(., organization_name, paying_advertiser_name, spend, candidate_ballot_information, : object 'polads' not found

All of the ads that excluded these regions supported Donald Trump as a candidate, were by an organization called ‘Marud Khan’, and were paid for by Albbiom Marketing LLC. According to Markay (2020), Albbiom Marketing LLC is a marketing company without a proper address that provides “free” Trump merchandise and has scammed people in the past. They also found no evidence that ‘Marud Khan’ was a real person.

Creating a data subset for location analysis

Checking the distribution of values in the spend and impressions columns:

Code

#| label: distribution of spend and impressions

ggplot(polads, aes(x=spend)) + geom_histogram() + theme_minimal() + labs(title = "Expenditure Distribution", x = "Expenditure", y = "Frequency")

Error in ggplot(polads, aes(x = spend)): object 'polads' not found

Code

ggplot(polads, aes(x=impressions)) + geom_histogram() + theme_minimal() + labs(title = "Impressions Distribution", x = "Impressions", y = "Frequency")

Error in ggplot(polads, aes(x = impressions)): object 'polads' not found

Clearly, both distributions are skewed to the right and are not symmetric. Hence, I’m taking the median of these columns for analysis. Creating a subset of the data for further analysis:

Code

polads_loc1 <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

Error in select(., regions_included, spend, impressions, candidate_ballot_information): object 'polads' not found

Code

polads_loc1

Error in eval(expr, envir, enclos): object 'polads_loc1' not found

Ad expenditure across states

Code

loc_spend_plot <- plot_usmap(data = polads_loc1, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1' not found

Code

loc_spend_plot

Error in eval(expr, envir, enclos): object 'loc_spend_plot' not found

From this plot, we can observe that ads targeting Pennsylvania and Nebraska had relatively higher median expenditure. Now, let’s look at median ad expenditure across states by the candidate they supported.

Code

# Biden ads
polads_loc1_biden <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  filter(candidate_ballot_information=="Biden")%>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

Error in select(., regions_included, spend, impressions, candidate_ballot_information): object 'polads' not found

Code

polads_loc1_biden

Error in eval(expr, envir, enclos): object 'polads_loc1_biden' not found

Code

# plotting expenditure for Biden ads
loc_spend_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1_biden' not found

Code

# Trump ads
polads_loc1_trump <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information) %>%
  filter(candidate_ballot_information=="Trump")%>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

Error in select(., regions_included, spend, impressions, candidate_ballot_information): object 'polads' not found

Code

polads_loc1_trump

Error in eval(expr, envir, enclos): object 'polads_loc1_trump' not found

Code

# plotting expenditure for Trump ads
loc_spend_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "spend_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Expenditure") +
labs(title = "Snapchat Targeted Political Ad Expenditure for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1_trump' not found

Code

# comparing plots
loc_spend_biden_plot

Error in eval(expr, envir, enclos): object 'loc_spend_biden_plot' not found

Code

loc_spend_trump_plot

Error in eval(expr, envir, enclos): object 'loc_spend_trump_plot' not found

For Biden’s ads, the median expenditure was highest in Pennsylvania and Nebraska. For Trump’s, it was highest in Texas, Mississippi, and South Carolina. While Trump explicitly targeted all states, Biden’s ads were limited to particular states.

Next, I’m looking at how ad expenditure across targeted states changes over the months:

Code

polads_loc_month1 <- polads %>%
  select(regions_included,spend,impressions,candidate_ballot_information,start_month) %>%
  drop_na(regions_included) %>%
  group_by(regions_included,candidate_ballot_information,start_month)%>%
  summarize(spend_median = median(spend),impressions_median = median(impressions)) %>%
  rename(state=regions_included)

polads_loc_month1

# plotting expenditure
loc_spend_month_plot <- plot_usmap(data = polads_loc_month1, values = "spend_median", labels = FALSE,label_color = "black") + scale_fill_viridis_c(name = "Ad Expenditure Amount by Month") +
labs(title = "Snapchat Targeted Political Ad Expenditure Across the States") + theme(legend.position = "right")

loc_spend_month_plot

# animating change in median expenditure by month
loc_spend_month_transition <- loc_spend_month_plot +
  labs(title = "Total Political Ad Expenditure in {as.numeric(frame_time)}") + transition_time(as.numeric(start_month))

loc_spend_anim <- animate(loc_spend_month_transition, fps=10) + ease_aes('linear')
loc_spend_anim

Note

I couldn’t get the above block of code to display any output even though it ran perfectly fine on my RStudio.

Ad impressions across states

Code

loc_imp_plot <- plot_usmap(data = polads_loc1, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1' not found

Code

loc_imp_plot

Error in eval(expr, envir, enclos): object 'loc_imp_plot' not found

Overall, Mississippi and South Carolina had the highest median ad impressions.

Now, looking at the median ad impressions across states by the candidate they supported.

Code

# plotting impressions for Biden ads
loc_imp_biden_plot <- plot_usmap(data = polads_loc1_biden, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Biden Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1_biden' not found

Code

# plotting impressions for Trump ads
loc_imp_trump_plot <- plot_usmap(data = polads_loc1_trump, values = "impressions_median", labels = FALSE) + scale_fill_viridis_c(name = "Median Ad Impressions") +
labs(title = "Snapchat Targeted Political Ad Impressions for Trump Across the States (2020)",caption = "Note: This plot excludes ads targetting no states in particular") + theme(legend.position = "right")

Error in nrow(data): object 'polads_loc1_trump' not found

Code

# comparing plots
loc_imp_biden_plot

Error in eval(expr, envir, enclos): object 'loc_imp_biden_plot' not found

Code

loc_imp_trump_plot

Error in eval(expr, envir, enclos): object 'loc_imp_trump_plot' not found

In a similar trend to median expenditure, Biden’s ads got their highest median impressions in Nebraska and Pennsylvania, while median impressions for Trump’s ads were highest in Texas, Mississippi, and South Carolina.

Conducting a sanity check with the data:

Code

# checking highest median expenditure by candidate
polads_loc1%>%
  select(state,candidate_ballot_information,spend_median) %>%
  group_by(candidate_ballot_information) %>%
  arrange(desc(spend_median)) %>%
  slice(1:3) %>%
  knitr::kable(caption = "Highest Median Ad Expenditure by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
    kable_minimal()

Error in select(., state, candidate_ballot_information, spend_median): object 'polads_loc1' not found

Code

# checking highest median impressions by candidate
polads_loc1%>%
  select(state,candidate_ballot_information,impressions_median) %>%
  group_by(candidate_ballot_information) %>%
  arrange(desc(impressions_median)) %>%
  slice(1:3) %>%
    knitr::kable(caption = "Highest Median Ad Impressions by State and Candidate (2020)",col.names = c("State","Candidate","Median Expenditure"))%>%
    kable_minimal()

Error in select(., state, candidate_ballot_information, impressions_median): object 'polads_loc1' not found

Reflections

In 2020, the United States made far more use of the Snapchat social media platform for political ads compared to other countries. The above analysis showcased the reach and funding of advertisements supporting the candidates Joe Biden and Donald Trump prior to and during the 2020 presidential election season. The variables that I focused on - ad expenditure, ad impressions, location micro-targeting, and even ad duration - all contribute to forming an effective political advertising strategy. It is important to note that there were more ads that supported Biden in this dataset, which may have skewed the results summarized below.

The data revealed a close correlation between the amount of expenditure on ads and the impressions they received. Ads supporting Biden had more funds allocated to them and also received more impressions. This may have played a part in his election victory. Biden’s ads were more frequent in the second half of 2020, while it was the opposite trend for Trump’s ads. The timing of the ad also matters. A shorter ad supporting Biden displayed on election day received more impressions that the one running for more than 800 hours from September 2020. From the location visualizations, it seems that candidates were targeting states that were predominantly Democratic or Republican in order to either win them over or maintain their party dominance.

In terms of the data used, I wish I looked at how sparse the data were in columns like advanced_demographics and radius_targeting_included before beginning because I was really looking forward to using it in my analysis. Another caveat of this data was that since there were multiple regions in a single entry of the regions_included column, it became hard to find out the individual ad expenditure and impressions for each state.

Nevertheless, I enjoyed the process of completing this project. Though I can still improve, I learnt a lot about coding in R - from writing tidy code to creating publication-worthy plots. At the same time, I think I got a bit overwhelmed with everything that can be done in R since I kept going down an online rabbit hole of endless packages and techniques. Also, I learnt not to underestimate the importance of the data cleaning process; I spent a lot more time on that than actually analyzing the data.

Future Directions

Further analysis can be done with this dataset. One could determine the type of entity paying for the ads for both candidates (whether it was funded by their own campaign or an outside organization), and the highest paying advertisers. I wanted to do more with the ad_duration column I’d created, but I found working with the difftime object more difficult than expected. Plots showing change in expenditure and impressions by state over time could also be generated. More analysis can be conducted with the postal codes data provided in this dataset to map more specific regions that the ads were targeting. Lastly, future projects could join data on individual state populations and analyze expenditure and impressions in relation to that.

References

California Fair Political Practices Commission. (2020). Electronic Media Advertisements [PDF]. Retrieved 3 September 2022, from https://www.fppc.ca.gov/content/dam/fppc/NS-Documents/AgendaDocuments/Task-Force/dttf-2020/march-2020/Legal.pdf.

Campaign Disclosure, Filer Resources, Alaska Public Offices Commission, Department of Administration, State of Alaska. Alaska Department of Administration. Retrieved 3 September 2022, from https://doa.alaska.gov/apoc/FilerResources/campaignDisclosure.html.

Contribution Limits. Campaign Spending Commission. Retrieved 3 September 2022, from https://ags.hawaii.gov/campaign/contribution-limits/.

Grolemund, G., & Wickham, H. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media.

Lookalike Audiences. Business Help Center - Snapchat. Retrieved 3 September 2022, from https://businesshelp.snapchat.com/s/article/create-lookalike-audience?language=en_US.

Markay, L. (2020). The Trump-Scam-Industrial-Complex Now Extends to Snapchat. Daily Beast. Retrieved 2 September 2022, from https://www.thedailybeast.com/the-trump-scam-industrial-complex-now-extends-to-snapchat?ref=scroll.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.https://www.r-project.org.

Rodriguez, S. (2020). Snap stock rockets up after surprise earnings beat. CNBC. Retrieved 3 September 2022, from https://www.cnbc.com/2020/10/20/snap-earnings-q3-2020.html.

RStudio Team. (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA. https://www.rstudio.com.

Snap Audience Match Terms. Snap Inc. Retrieved 3 September 2022, from https://snap.com/en-US/terms/snap-audience-match.

Snapchat statistics 2020. (2020). Smart Insights. Retrieved 4 September 2022, from https://www.smartinsights.com/social-media-marketing/social-media-strategy/snapchat-statistics/.

Snap Inc. (n.d.). PoliticalAds [Data set]. https://snap.com/en-US/political-ads.

Tomasi, R. (2021). Quick guide on social media advertising for campaigns and public institutions. The European Campaign Playbook. Retrieved 2 September 2022, from https://www.campaignplaybook.eu/blog_quick_guide_on_social_media_advertising.

What is Micro-Targeting & How Does it Affect Advertising (n.d.). MNI Targeted Media. Retrieved 4 September 2022, from https://www.mni.com/blog/advertmarket/what-is-micro-targeting-how-does-it-affect-advertising/.

Appendix

Dataframe variable names and descriptions

Variable Name	Description
`ADID`	A unique value for each political advertisement.
`CreativeURL`	A URL to the advertisement’s creative content.
`Currency Code`	The currency used by the account creating the advertisement.
`Spend`	The amount spent by the advertiser for the ad campaign expressed in local currency.
`Impressions`	The number of times the advertisement has been viewed by Snapchat users.
`StartDate`	The time at which the advertisement was set to start running on the platform.
`EndDate`	The time at which the advertisement was set to stop running on the platform.
`OrganizationName`	The organization that is responsible for creating the advertisement.
`BillingAddress`	The address of the organization that is responsible for creating the advertisement.
`CandidateBallotInformation`	Information on the candidate (for California elections: also the office they are contesting for) or ballot initiative that the advertisement is associated with the advertisement.
`PayingAdvertiserName`	The entity that is providing funds for the advertisement.
`CommitteeName`	The name of the committee paying for the advertisement.
`CommitteeIdentificationNumber`	The identification number of the committee paying for the advertisement.
`DisclosureNameOfCommittee`	The disclosure name of the committee paying for the advertisement, as stipulated by California law.
`AdvertisingJurisdiction`	The jurisdiction that the advertisement refers to.
`Gender`	The genders targeted by the advertisement. If this field is empty, all genders were targeted.
`AgeBracket`	The ages targeted by the advertisement. If this field is empty, all ages were targeted.
`CountryCode`	The country that the advertisement is targeting.
`Regions (Included)`	The region(s) included in the advertisement’s targeting criteria (states or provinces).
`Regions (Excluded)`	The region(s) excluded in the advertisement’s targeting criteria (states or provinces).
`Electoral Districts (Included)`	The electoral district(s) included in the advertisement’s targeting criteria.
`Electoral Districts (Excluded)`	The electoral district(s) excluded in the advertisement’s targeting criteria.
`Radius Targeting (Included)`	The point-radius circles included in the advertisement’s targeting criteria.
`Radius Targeting (Excluded)`	The point-radius circles excluded in the advertisement’s targeting criteria.
`Metros (Included)`	The metro(s) included in the advertisement’s targeting criteria.
`Metros (Excluded)`	The metro(s) excluded in the advertisement’s targeting criteria.
`Postal Code (Included)`	The postal code(s) included in the advertisement’s targeting criteria.
`Postal Code (Excluded)`	The postal code(s) excluded in the advertisement’s targeting criteria.
`Location Categories (Included)`	The location categories included in the advertisement’s targeting criteria.
`Location Categories (Excluded)`	The location categories excluded in the advertisement’s targeting criteria.
`Interests`	The interest audience(s) included in the advertisement’s targeting criteria. If this field is empty, then no interest targeting was used.
`OsType`	The operating systems included in the advertisement’s targeting criteria. If this field is empty, then all operating systems were targeted.
`Segments`	The segments included in the advertisement’s targeting criteria. This is advertiser-specific data used such as Snap Audience Match¹ or Lookalike audiences²
`Language`	The languages targeted by the advertisement. If this field is empty, then no language-based targeting was used.
`AdvancedDemographics`	The third-party data segments targeted by the advertisement. If this field is empty, then no third-party data segments were used.
`Targeting Connection Type`	The internet connection type targeted by the advertisement. If this field is empty, then no targeting based on internet connect type was used.
`Targeting Carrier (ISP)`	The carrier type targeted by the advertisement. If this field is empty, all carrier types are targeted.
`CreativeProperties`	The URL specified in advertisement’s call to action.

Footnotes

Snap Audience Match or Customer List Audience is a Snapchat feature that allows users to send their data to the platform and its affiliates to form custom audiences (“Snapchat Audience Match Terms”, n.d.).↩︎
A Lookalike audience reaches Snapchat users that have similar characteristics to an organization account’s existing customers. There are three different options: Similarity (a small-size audience that closely resembles the seed audience), Balance (a medium-size audience that balances similarity and reach), and Reach (a large-size audience that broadly resembles the seed audience) (“Lookalike Audiences”, n.d.).↩︎

Loading the Packages

Introduction

Reading in the Data

Tidying the Data

Narrowing down the data

The case of age_bracket and advanced_demographics

Wrangling with the date columns

Analyzing and Visualizing the Data

Ad Expenditure and Impression Analysis

Location Targeting Analysis

Wrangling with the location columns

Creating a data subset for location analysis

Ad expenditure across states

Ad impressions across states

Reflections

Future Directions

References

Appendix

Dataframe variable names and descriptions

Footnotes

The case of `age_bracket` and `advanced_demographics`