Code
library(tidyverse)
library(haven)
library(labelled)
::opts_chunk$set(echo = TRUE) knitr
Karen Detter
October 11, 2022
What predicts support for government regulation of ‘Big Tech’?
In 2001, Google piloted a program to boost profits, which were sinking as the “dot-com bubble” burst, by collecting data generated from users’ search queries and using it to sell precisely targeted advertising. The company’s ad revenues grew so quickly that they expanded their data collection tools with tracking “cookies” and predictive algorithms. Other technology firms took notice of Google’s soaring profits, and the sale of passively-collected data from people’s online activities soon became the predominant business model of the internet economy (Zuboff, 2015).
As the data-collection practices of ‘Big Tech’ firms, including Google, Amazon, Facebook (Meta), Apple, and Microsoft, have gradually been exposed, the public is now aware that the ‘free’ platforms that have become essential to daily life are actually harvesting personal information as payment. Despite consumers being essentially extorted into accepting this arrangement, regulatory intervention of ‘surveillance capitalism’ has remained limited.
Over the two decades since passive data collection began commercializing the internet, survey research has shown the American public’s increasing concern about the dominance Big Tech has been allowed to exert. A 2019 study conducted by Pew Research Center found that 81% of Democrats and 70% of Republicans think there should be more government regulation of corporate data-use practices (Pew Research Center, 2019). It is very unusual to find majorities of both Republicans and Democrats agreeing on any policy position, since party affiliation is known to be a main predictor of any political stance, especially in the current polarized climate. The natural question that arises, then, is what other factors predict support for increased regulation of data-collection practices?
Although few studies have directly examined the mechanisms behind public support for regulation of passive data collection, a good amount of research has been done on factors influencing individual adoption of privacy protection measures (Barth et al., 2019; Boerman et al., 2021; Turow et al., 2015). It seems a reasonable extrapolation that these factors would similarly influence support for additional data privacy regulation, leading to these hypotheses:
A higher level of awareness of data collection issues predicts support for increased ‘Big Tech’ regulation.
Greater understanding of how companies use passively collected data predicts support for increased regulation.
The feeling of having no personal control over online tracking ‘digital resignation’ predicts support for increased regulation.
Certain demographic traits (age group, education level, and political ideology) have some kind of effect on attitudes toward ‘Big Tech’ regulation.
Since there are currently dozens of data privacy bills pending in Congress, pinpointing the forces driving support for this type of legislation can help with both shaping the regulatory framework needed and appealing for broader support from voters.
Pew Research Center’s American Trends Panel (Wave 49) data set can provide insight into which of these factors are predictive of support for greater regulation of technology company data practices. In June 2019, an online survey covering a wide variety of topics was conducted and 4,272 separate observations for 144 variables were collected from adults age 18 and over. The margin of error (at the 95% confidence level) is given as +/- 1.87 percentage points.
The data set was compiled in SPSS and all pertinent variables are categorical.
# A tibble: 4,272 × 144
QKEY DEVICE_TYPE_…¹ LANG_…² FORM_…³ SOCME…⁴ SOCME…⁵ SOCME…⁶ SOCME…⁷ SNSUS…⁸
<dbl> <dbl+lbl> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
1 100260 2 [Tablet] 9 [Eng… 2 [For… 2 [No,… 2 [No,… 2 [No,… 2 [No,… 0 [Doe…
2 100588 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 1 [Yes… 2 [No,… 1 [Soc…
3 100637 3 [Desktop] 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
4 101224 1 [Mobile pho… 9 [Eng… 2 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
5 101322 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
6 101437 3 [Desktop] 9 [Eng… 2 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
7 101472 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 1 [Yes… 2 [No,… 1 [Soc…
8 101493 3 [Desktop] 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 1 [Yes… 1 [Soc…
9 102198 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 2 [No,… 1 [Yes… 1 [Soc…
10 103094 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 1 [Yes… 1 [Yes… 1 [Soc…
# … with 4,262 more rows, 135 more variables: ELECTFTGSNSINT_W49 <dbl+lbl>,
# TALKDISASNSINT_W49 <dbl+lbl>, TALKCMNSNSINT_W49 <dbl+lbl>,
# SECUR1_W49 <dbl+lbl>, PRIVACYNEWS1_W49 <dbl+lbl>,
# HOMEASSIST1_W49 <dbl+lbl>, HOMEASSIST2_W49 <dbl+lbl>,
# HOMEASSIST3_W49 <dbl+lbl>, HOMEASSIST4_W49 <dbl+lbl>,
# HOMEASSIST5a_W49 <dbl+lbl>, HOMEASSIST5b_W49 <dbl+lbl>,
# HOMEIOT_W49 <dbl+lbl>, FITTRACK_W49 <dbl+lbl>, LOYALTY_W49 <dbl+lbl>, …
Since there are so many variables in the data set, selecting the variables of interest into a new data frame will make it easier to manage:
# A tibble: 4,272 × 13
PRIVACYNEWS1_…¹ TRACKC…² CONTRO…³ UNDERS…⁴ ANONYM…⁵ PP4_W49 PRIVA…⁶ GOVREG…⁷
<dbl+lbl> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+lb>
1 4 [Not at all … NA NA NA NA NA 3 [Ver… NA
2 3 [Not too clo… 3 [Som… 2 [Som… 3 [Ver… 1 [Yes… 3 [Ver… 3 [Ver… 1 [Mor…
3 3 [Not too clo… 3 [Som… 3 [Ver… 3 [Ver… 1 [Yes… 2 [Som… 3 [Ver… 1 [Mor…
4 4 [Not at all … NA NA NA NA 3 [Ver… 3 [Ver… NA
5 4 [Not at all … 1 [All… 4 [No … 4 [Not… 2 [No,… NA 4 [Not… 1 [Mor…
6 2 [Somewhat cl… NA NA NA NA 3 [Ver… 3 [Ver… NA
7 2 [Somewhat cl… 2 [Mos… 3 [Ver… 3 [Ver… 2 [No,… 2 [Som… 2 [Som… 3 [Abo…
8 1 [Very closel… 1 [All… 4 [No … 4 [Not… 2 [No,… NA 3 [Ver… 3 [Abo…
9 3 [Not too clo… 1 [All… 3 [Ver… 2 [Som… 2 [No,… 3 [Ver… 3 [Ver… 1 [Mor…
10 3 [Not too clo… 3 [Som… 2 [Som… 1 [A g… 1 [Yes… 2 [Som… 2 [Som… 2 [Les…
# … with 4,262 more rows, 5 more variables: PROFILE4_W49 <dbl+lbl>,
# F_AGECAT <dbl+lbl>, F_EDUCCAT <dbl+lbl>, F_PARTYSUM_FINAL <dbl+lbl>,
# F_IDEO <dbl+lbl>, and abbreviated variable names ¹PRIVACYNEWS1_W49,
# ²TRACKCO1a_W49, ³CONTROLCO_W49, ⁴UNDERSTANDCO_W49, ⁵ANONYMOUS1CO_W49,
# ⁶PRIVACYREG_W49, ⁷GOVREGV1_W49
The variable labels contain the survey questions asked:
$PRIVACYNEWS1_W49
[1] "PRIVACYNEWS1. How closely, if at all, do you follow news about privacy issues?"
$TRACKCO1a_W49
[1] "TRACKCO1a. As far as you know, how much of what you do ONLINE or on your cellphone is being tracked by advertisers, technology firms or other companies?"
$CONTROLCO_W49
[1] "CONTROLCO. How much control do you think you have over the data that companies collect about you?"
$UNDERSTANDCO_W49
[1] "UNDERSTANDCO. How much do you feel you understand what companies are doing with the data they collect about you?"
$ANONYMOUS1CO_W49
[1] "ANONYMOUS1CO. Do you think it is possible to go about daily life today without having companies collect data about you?"
$PP4_W49
[1] "PP4. How much do you typically understand the privacy policies you read?"
$PRIVACYREG_W49
[1] "PRIVACYREG. How much do you feel you understand the laws and regulations that are currently in place to protect your data privacy?"
$GOVREGV1_W49
[1] "GOVREGV1. How much government regulation of what companies can do with their customers’ personal information do you think there should be?"
$PROFILE4_W49
[1] "PROFILE4. How much, if at all, do you understand what data about you is being used to create these advertisements?"
$F_AGECAT
[1] "Age category"
$F_EDUCCAT
[1] "Education level category"
$F_PARTYSUM_FINAL
[1] "Party summary"
$F_IDEO
[1] "Ideology"
Because the data set is made up of categorical variables, transformation is required before computing any statistics:
# A tibble: 4,272 × 13
PRIVACYNEWS…¹ TRACK…² CONTR…³ UNDER…⁴ ANONY…⁵ PP4_W49 PRIVA…⁶ GOVRE…⁷ PROFI…⁸
<fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct>
1 Not at all c… <NA> <NA> <NA> <NA> <NA> Very l… <NA> <NA>
2 Not too clos… Some o… Some c… Very l… Yes, i… Very l… Very l… More r… <NA>
3 Not too clos… Some o… Very l… Very l… Yes, i… Some Very l… More r… Somewh…
4 Not at all c… <NA> <NA> <NA> <NA> Very l… Very l… <NA> <NA>
5 Not at all c… All or… No con… Nothing No, it… <NA> Not at… More r… Not to…
6 Somewhat clo… <NA> <NA> <NA> <NA> Very l… Very l… <NA> Not to…
7 Somewhat clo… Most o… Very l… Very l… No, it… Some Some About … Somewh…
8 Very closely All or… No con… Nothing No, it… <NA> Very l… About … Somewh…
9 Not too clos… All or… Very l… Some No, it… Very l… Very l… More r… Somewh…
10 Not too clos… Some o… Some c… A grea… Yes, i… Some Some Less r… Somewh…
# … with 4,262 more rows, 4 more variables: F_AGECAT <fct>, F_EDUCCAT <fct>,
# F_PARTYSUM_FINAL <fct>, F_IDEO <fct>, and abbreviated variable names
# ¹PRIVACYNEWS1_W49, ²TRACKCO1a_W49, ³CONTROLCO_W49, ⁴UNDERSTANDCO_W49,
# ⁵ANONYMOUS1CO_W49, ⁶PRIVACYREG_W49, ⁷GOVREGV1_W49, ⁸PROFILE4_W49
After the variables are converted to meaningful factors, a summary of response frequencies can be generated:
PRIVACYNEWS1_W49 TRACKCO1a_W49
Very closely : 461 All or almost all of it: 881
Somewhat closely :2046 Most of it : 703
Not too closely :1397 Some of it : 381
Not at all closely: 359 Very little of it : 88
Refused : 9 None of it : 76
Refused : 11
NA's :2132
CONTROLCO_W49 UNDERSTANDCO_W49
A great deal of control: 68 A great deal: 132
Some control : 313 Some : 716
Very little control :1134 Very little :1040
No control : 621 Nothing : 242
Refused : 4 Refused : 10
NA's :2132 NA's :2132
ANONYMOUS1CO_W49 PP4_W49 PRIVACYREG_W49
Yes, it is possible : 772 A great deal: 328 A great deal: 136
No, it is not possible:1357 Some :1405 Some :1380
Refused : 11 Very little : 751 Very little :2153
NA's :2132 Not at all : 82 Not at all : 593
Refused : 5 Refused : 10
NA's :1701
GOVREGV1_W49 PROFILE4_W49 F_AGECAT
More regulation :1631 A great deal: 384 18-29 : 671
Less regulation : 145 Somewhat :1410 30-49 :1314
About the same amount: 331 Not too much: 900 50-64 :1308
Refused : 33 Not at all : 113 65+ : 977
NA's :2132 Refused : 9 DK/REF: 2
NA's :1456
F_EDUCCAT F_PARTYSUM_FINAL
College graduate+ :1600 Rep/Lean Rep :1823
Some College :1182 Dem/Lean Dem :2296
H.S. graduate or less:1483 DK/Refused/No lean: 153
Don't know/Refused : 7
F_IDEO
Very conservative: 353
Conservative : 977
Moderate :1615
Liberal : 828
Very liberal : 386
Refused : 113
*High NA value indicates that the question was not presented to all respondents
The data set is now primed for examining correlations and testing hypotheses.
Barth, S., de Jong, M. D. T., Junger, M., Hartel, P. H. & Roppelt, J. C. (2019). Putting the privacy paradox to the test: Online privacy and security behaviors among users with technical knowledge, privacy awareness, and financial resources. Telematics and Informatics, 41, 55–69. doi:10.1016/j.tele.2019.03.003
Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2021). Exploring Motivations for Online Privacy Protection Behavior: Insights From Panel Data. Communication Research, 48(7), 953–977. https://doi.org/10.1177/0093650218800915
Pew Research Center. (2019). Americans and privacy: Concerned, confused and feeling lack of control over their personal information. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and- feeling-lack-of-control-over-their-personal-information/
Pew Research Center. (2020). Wave 49 American trends panel [Data set]. https://www.pewresearch.org/internet/dataset/american-trends-panel-wave-49/
Turow, J., Hennessy, M. & Draper, N. (2015). The tradeoff fallacy – How marketers are misrepresenting American consumers and opening them up to exploitation. Annenberg School for Communication.
Zuboff, S. (2015). Big other: Surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89. doi:10.1057/jit.2015.5
---
title: "Final Project Proposal"
author: "Karen Detter"
desription: "What predicts support for government regulation of 'Big Tech'?"
date: "10/11/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- finalpart1
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(haven)
library(labelled)
knitr::opts_chunk$set(echo = TRUE)
```
# Background / Research Question
What predicts support for government regulation of 'Big Tech'?
In 2001, Google piloted a program to boost profits, which were sinking as the "dot-com bubble" burst, by collecting data generated from users' search queries and using it to sell precisely targeted advertising. The company's ad revenues grew so quickly that they expanded their data collection tools with tracking "cookies" and predictive algorithms. Other technology firms took notice of Google's soaring profits, and the sale of passively-collected data from people's online activities soon became the predominant business model of the internet economy (Zuboff, 2015).
As the data-collection practices of 'Big Tech' firms, including Google, Amazon, Facebook (Meta), Apple, and Microsoft, have gradually been exposed, the public is now aware that the 'free' platforms that have become essential to daily life are actually harvesting personal information as payment. Despite consumers being essentially extorted into accepting this arrangement, regulatory intervention of 'surveillance capitalism' has remained limited.
Over the two decades since passive data collection began commercializing the internet, survey research has shown the American public's increasing concern about the dominance Big Tech has been allowed to exert. A 2019 study conducted by Pew Research Center found that 81% of Democrats and 70% of Republicans think there should be more government regulation of corporate data-use practices (Pew Research Center, 2019). It is very unusual to find majorities of both Republicans and Democrats agreeing on any policy position, since party affiliation is known to be a main predictor of any political stance, especially in the current polarized climate. The natural question that arises, then, is what other factors predict support for increased regulation of data-collection practices?
# Hypothesis
Although few studies have directly examined the mechanisms behind public support for regulation of passive data collection, a good amount of research has been done on factors influencing individual adoption of privacy protection measures (Barth et al., 2019; Boerman et al., 2021; Turow et al., 2015). It seems a reasonable extrapolation that these factors would similarly influence support for additional data privacy regulation, leading to these hypotheses:
1) A higher level of awareness of data collection issues predicts support for increased 'Big Tech' regulation.
2) Greater understanding of how companies use passively collected data predicts support for increased regulation.
3) The feeling of having no personal control over online tracking 'digital resignation' predicts support for increased regulation.
4) Certain demographic traits (age group, education level, and political ideology) have some kind of effect on attitudes toward 'Big Tech' regulation.
Since there are currently dozens of data privacy bills pending in Congress, pinpointing the forces driving support for this type of legislation can help with both shaping the regulatory framework needed and appealing for broader support from voters.
# Descriptive Statistics
Pew Research Center's American Trends Panel (Wave 49) data set can provide insight into which of these factors are predictive of support for greater regulation of technology company data practices. In June 2019, an online survey covering a wide variety of topics was conducted and 4,272 separate observations for 144 variables were collected from adults age 18 and over. The margin of error (at the 95% confidence level) is given as +/- 1.87 percentage points.
The data set was compiled in SPSS and all pertinent variables are categorical.
```{r}
#read in data from SPSS file
wav49 <- read_sav("_data/ATPW49.sav")
wav49
```
Since there are so many variables in the data set, selecting the variables of interest into a new data frame will make it easier to manage:
```{r}
sel_vars <- c('PRIVACYNEWS1_W49', 'TRACKCO1a_W49', 'CONTROLCO_W49', 'UNDERSTANDCO_W49', 'ANONYMOUS1CO_W49', 'PP4_W49', 'PRIVACYREG_W49', 'GOVREGV1_W49', 'PROFILE4_W49', 'F_AGECAT', 'F_EDUCCAT', 'F_PARTYSUM_FINAL', 'F_IDEO')
wav49_selected <- wav49[sel_vars]
wav49_selected
```
The variable labels contain the survey questions asked:
```{r}
#summary of $variable names and their [labels]
var_label(wav49_selected)
```
Because the data set is made up of categorical variables, transformation is required before computing any statistics:
```{r}
#convert all variables to factors
wav49_factored <- wav49_selected %>%
mutate_all(as_factor)
#convert user-defined missing values to regular missing values
zap_missing(wav49_factored)
```
After the variables are converted to meaningful factors, a summary of response frequencies can be generated:
```{r}
summary(wav49_factored)
```
*High NA value indicates that the question was not presented to all respondents
The data set is now primed for examining correlations and testing hypotheses.
# References
Barth, S., de Jong, M. D. T., Junger, M., Hartel, P. H. & Roppelt, J. C. (2019). Putting the privacy paradox to the test: Online privacy and security behaviors among users with technical knowledge, privacy awareness, and financial resources. Telematics and Informatics, 41, 55–69. doi:10.1016/j.tele.2019.03.003
Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2021). Exploring Motivations for Online Privacy Protection Behavior: Insights From Panel Data. Communication Research, 48(7), 953–977. https://doi.org/10.1177/0093650218800915
Pew Research Center. (2019). Americans and privacy: Concerned, confused and feeling lack of control over their personal information. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and- feeling-lack-of-control-over-their-personal-information/
Pew Research Center. (2020). Wave 49 American trends panel [Data set]. https://www.pewresearch.org/internet/dataset/american-trends-panel-wave-49/
Turow, J., Hennessy, M. & Draper, N. (2015). The tradeoff fallacy – How marketers are misrepresenting American consumers and opening them up to exploitation. Annenberg School for Communication.
Zuboff, S. (2015). Big other: Surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89. doi:10.1057/jit.2015.5