Final Project Proposal

finalpart1
Author

Karen Detter

Published

October 11, 2022

Code
library(tidyverse)
library(haven)
library(labelled)

knitr::opts_chunk$set(echo = TRUE)

Background / Research Question

What predicts support for government regulation of ‘Big Tech’?

In 2001, Google piloted a program to boost profits, which were sinking as the “dot-com bubble” burst, by collecting data generated from users’ search queries and using it to sell precisely targeted advertising. The company’s ad revenues grew so quickly that they expanded their data collection tools with tracking “cookies” and predictive algorithms. Other technology firms took notice of Google’s soaring profits, and the sale of passively-collected data from people’s online activities soon became the predominant business model of the internet economy (Zuboff, 2015).

As the data-collection practices of ‘Big Tech’ firms, including Google, Amazon, Facebook (Meta), Apple, and Microsoft, have gradually been exposed, the public is now aware that the ‘free’ platforms that have become essential to daily life are actually harvesting personal information as payment. Despite consumers being essentially extorted into accepting this arrangement, regulatory intervention of ‘surveillance capitalism’ has remained limited.

Over the two decades since passive data collection began commercializing the internet, survey research has shown the American public’s increasing concern about the dominance Big Tech has been allowed to exert. A 2019 study conducted by Pew Research Center found that 81% of Democrats and 70% of Republicans think there should be more government regulation of corporate data-use practices (Pew Research Center, 2019). It is very unusual to find majorities of both Republicans and Democrats agreeing on any policy position, since party affiliation is known to be a main predictor of any political stance, especially in the current polarized climate. The natural question that arises, then, is what other factors predict support for increased regulation of data-collection practices?

Hypothesis

Although few studies have directly examined the mechanisms behind public support for regulation of passive data collection, a good amount of research has been done on factors influencing individual adoption of privacy protection measures (Barth et al., 2019; Boerman et al., 2021; Turow et al., 2015). It seems a reasonable extrapolation that these factors would similarly influence support for additional data privacy regulation, leading to these hypotheses:

  1. A higher level of awareness of data collection issues predicts support for increased ‘Big Tech’ regulation.

  2. Greater understanding of how companies use passively collected data predicts support for increased regulation.

  3. The feeling of having no personal control over online tracking ‘digital resignation’ predicts support for increased regulation.

  4. Certain demographic traits (age group, education level, and political ideology) have some kind of effect on attitudes toward ‘Big Tech’ regulation.

Since there are currently dozens of data privacy bills pending in Congress, pinpointing the forces driving support for this type of legislation can help with both shaping the regulatory framework needed and appealing for broader support from voters.

Descriptive Statistics

Pew Research Center’s American Trends Panel (Wave 49) data set can provide insight into which of these factors are predictive of support for greater regulation of technology company data practices. In June 2019, an online survey covering a wide variety of topics was conducted and 4,272 separate observations for 144 variables were collected from adults age 18 and over. The margin of error (at the 95% confidence level) is given as +/- 1.87 percentage points.

The data set was compiled in SPSS and all pertinent variables are categorical.

Code
#read in data from SPSS file
wav49 <- read_sav("_data/ATPW49.sav")
wav49
# A tibble: 4,272 × 144
     QKEY DEVICE_TYPE_…¹ LANG_…² FORM_…³ SOCME…⁴ SOCME…⁵ SOCME…⁶ SOCME…⁷ SNSUS…⁸
    <dbl> <dbl+lbl>      <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l> <dbl+l>
 1 100260 2 [Tablet]     9 [Eng… 2 [For… 2 [No,… 2 [No,… 2 [No,… 2 [No,… 0 [Doe…
 2 100588 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 1 [Yes… 2 [No,… 1 [Soc…
 3 100637 3 [Desktop]    9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
 4 101224 1 [Mobile pho… 9 [Eng… 2 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
 5 101322 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
 6 101437 3 [Desktop]    9 [Eng… 2 [For… 1 [Yes… 2 [No,… 2 [No,… 2 [No,… 1 [Soc…
 7 101472 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 2 [No,… 1 [Yes… 2 [No,… 1 [Soc…
 8 101493 3 [Desktop]    9 [Eng… 1 [For… 1 [Yes… 2 [No,… 2 [No,… 1 [Yes… 1 [Soc…
 9 102198 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 2 [No,… 1 [Yes… 1 [Soc…
10 103094 1 [Mobile pho… 9 [Eng… 1 [For… 1 [Yes… 1 [Yes… 1 [Yes… 1 [Yes… 1 [Soc…
# … with 4,262 more rows, 135 more variables: ELECTFTGSNSINT_W49 <dbl+lbl>,
#   TALKDISASNSINT_W49 <dbl+lbl>, TALKCMNSNSINT_W49 <dbl+lbl>,
#   SECUR1_W49 <dbl+lbl>, PRIVACYNEWS1_W49 <dbl+lbl>,
#   HOMEASSIST1_W49 <dbl+lbl>, HOMEASSIST2_W49 <dbl+lbl>,
#   HOMEASSIST3_W49 <dbl+lbl>, HOMEASSIST4_W49 <dbl+lbl>,
#   HOMEASSIST5a_W49 <dbl+lbl>, HOMEASSIST5b_W49 <dbl+lbl>,
#   HOMEIOT_W49 <dbl+lbl>, FITTRACK_W49 <dbl+lbl>, LOYALTY_W49 <dbl+lbl>, …

Since there are so many variables in the data set, selecting the variables of interest into a new data frame will make it easier to manage:

Code
sel_vars <- c('PRIVACYNEWS1_W49', 'TRACKCO1a_W49', 'CONTROLCO_W49', 'UNDERSTANDCO_W49', 'ANONYMOUS1CO_W49', 'PP4_W49', 'PRIVACYREG_W49', 'GOVREGV1_W49', 'PROFILE4_W49', 'F_AGECAT', 'F_EDUCCAT', 'F_PARTYSUM_FINAL', 'F_IDEO')
wav49_selected <- wav49[sel_vars]
wav49_selected
# A tibble: 4,272 × 13
   PRIVACYNEWS1_…¹ TRACKC…² CONTRO…³ UNDERS…⁴ ANONYM…⁵ PP4_W49  PRIVA…⁶ GOVREG…⁷
   <dbl+lbl>       <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+lb>
 1 4 [Not at all … NA       NA       NA       NA       NA       3 [Ver… NA      
 2 3 [Not too clo…  3 [Som…  2 [Som…  3 [Ver…  1 [Yes…  3 [Ver… 3 [Ver…  1 [Mor…
 3 3 [Not too clo…  3 [Som…  3 [Ver…  3 [Ver…  1 [Yes…  2 [Som… 3 [Ver…  1 [Mor…
 4 4 [Not at all … NA       NA       NA       NA        3 [Ver… 3 [Ver… NA      
 5 4 [Not at all …  1 [All…  4 [No …  4 [Not…  2 [No,… NA       4 [Not…  1 [Mor…
 6 2 [Somewhat cl… NA       NA       NA       NA        3 [Ver… 3 [Ver… NA      
 7 2 [Somewhat cl…  2 [Mos…  3 [Ver…  3 [Ver…  2 [No,…  2 [Som… 2 [Som…  3 [Abo…
 8 1 [Very closel…  1 [All…  4 [No …  4 [Not…  2 [No,… NA       3 [Ver…  3 [Abo…
 9 3 [Not too clo…  1 [All…  3 [Ver…  2 [Som…  2 [No,…  3 [Ver… 3 [Ver…  1 [Mor…
10 3 [Not too clo…  3 [Som…  2 [Som…  1 [A g…  1 [Yes…  2 [Som… 2 [Som…  2 [Les…
# … with 4,262 more rows, 5 more variables: PROFILE4_W49 <dbl+lbl>,
#   F_AGECAT <dbl+lbl>, F_EDUCCAT <dbl+lbl>, F_PARTYSUM_FINAL <dbl+lbl>,
#   F_IDEO <dbl+lbl>, and abbreviated variable names ¹​PRIVACYNEWS1_W49,
#   ²​TRACKCO1a_W49, ³​CONTROLCO_W49, ⁴​UNDERSTANDCO_W49, ⁵​ANONYMOUS1CO_W49,
#   ⁶​PRIVACYREG_W49, ⁷​GOVREGV1_W49

The variable labels contain the survey questions asked:

Code
#summary of $variable names and their [labels]
var_label(wav49_selected)
$PRIVACYNEWS1_W49
[1] "PRIVACYNEWS1. How closely, if at all, do you follow news about privacy issues?"

$TRACKCO1a_W49
[1] "TRACKCO1a. As far as you know, how much of what you do ONLINE or on your cellphone is being tracked by advertisers, technology firms or other companies?"

$CONTROLCO_W49
[1] "CONTROLCO. How much control do you think you have over the data that companies collect about you?"

$UNDERSTANDCO_W49
[1] "UNDERSTANDCO. How much do you feel you understand what companies are doing with the data they collect about you?"

$ANONYMOUS1CO_W49
[1] "ANONYMOUS1CO. Do you think it is possible to go about daily life today without having companies collect data about you?"

$PP4_W49
[1] "PP4. How much do you typically understand the privacy policies you read?"

$PRIVACYREG_W49
[1] "PRIVACYREG. How much do you feel you understand the laws and regulations that are currently in place to protect your data privacy?"

$GOVREGV1_W49
[1] "GOVREGV1. How much government regulation of what companies can do with their customers’ personal information do you think there should be?"

$PROFILE4_W49
[1] "PROFILE4. How much, if at all, do you understand what data about you is being used to create these advertisements?"

$F_AGECAT
[1] "Age category"

$F_EDUCCAT
[1] "Education level category"

$F_PARTYSUM_FINAL
[1] "Party summary"

$F_IDEO
[1] "Ideology"

Because the data set is made up of categorical variables, transformation is required before computing any statistics:

Code
#convert all variables to factors
wav49_factored <- wav49_selected %>%
  mutate_all(as_factor)
#convert user-defined missing values to regular missing values
zap_missing(wav49_factored)
# A tibble: 4,272 × 13
   PRIVACYNEWS…¹ TRACK…² CONTR…³ UNDER…⁴ ANONY…⁵ PP4_W49 PRIVA…⁶ GOVRE…⁷ PROFI…⁸
   <fct>         <fct>   <fct>   <fct>   <fct>   <fct>   <fct>   <fct>   <fct>  
 1 Not at all c… <NA>    <NA>    <NA>    <NA>    <NA>    Very l… <NA>    <NA>   
 2 Not too clos… Some o… Some c… Very l… Yes, i… Very l… Very l… More r… <NA>   
 3 Not too clos… Some o… Very l… Very l… Yes, i… Some    Very l… More r… Somewh…
 4 Not at all c… <NA>    <NA>    <NA>    <NA>    Very l… Very l… <NA>    <NA>   
 5 Not at all c… All or… No con… Nothing No, it… <NA>    Not at… More r… Not to…
 6 Somewhat clo… <NA>    <NA>    <NA>    <NA>    Very l… Very l… <NA>    Not to…
 7 Somewhat clo… Most o… Very l… Very l… No, it… Some    Some    About … Somewh…
 8 Very closely  All or… No con… Nothing No, it… <NA>    Very l… About … Somewh…
 9 Not too clos… All or… Very l… Some    No, it… Very l… Very l… More r… Somewh…
10 Not too clos… Some o… Some c… A grea… Yes, i… Some    Some    Less r… Somewh…
# … with 4,262 more rows, 4 more variables: F_AGECAT <fct>, F_EDUCCAT <fct>,
#   F_PARTYSUM_FINAL <fct>, F_IDEO <fct>, and abbreviated variable names
#   ¹​PRIVACYNEWS1_W49, ²​TRACKCO1a_W49, ³​CONTROLCO_W49, ⁴​UNDERSTANDCO_W49,
#   ⁵​ANONYMOUS1CO_W49, ⁶​PRIVACYREG_W49, ⁷​GOVREGV1_W49, ⁸​PROFILE4_W49

After the variables are converted to meaningful factors, a summary of response frequencies can be generated:

Code
summary(wav49_factored)
           PRIVACYNEWS1_W49                 TRACKCO1a_W49 
 Very closely      : 461    All or almost all of it: 881  
 Somewhat closely  :2046    Most of it             : 703  
 Not too closely   :1397    Some of it             : 381  
 Not at all closely: 359    Very little of it      :  88  
 Refused           :   9    None of it             :  76  
                            Refused                :  11  
                            NA's                   :2132  
                 CONTROLCO_W49      UNDERSTANDCO_W49
 A great deal of control:  68   A great deal: 132   
 Some control           : 313   Some        : 716   
 Very little control    :1134   Very little :1040   
 No control             : 621   Nothing     : 242   
 Refused                :   4   Refused     :  10   
 NA's                   :2132   NA's        :2132   
                                                    
               ANONYMOUS1CO_W49         PP4_W49          PRIVACYREG_W49
 Yes, it is possible   : 772    A great deal: 328   A great deal: 136  
 No, it is not possible:1357    Some        :1405   Some        :1380  
 Refused               :  11    Very little : 751   Very little :2153  
 NA's                  :2132    Not at all  :  82   Not at all  : 593  
                                Refused     :   5   Refused     :  10  
                                NA's        :1701                      
                                                                       
                GOVREGV1_W49        PROFILE4_W49    F_AGECAT   
 More regulation      :1631   A great deal: 384   18-29 : 671  
 Less regulation      : 145   Somewhat    :1410   30-49 :1314  
 About the same amount: 331   Not too much: 900   50-64 :1308  
 Refused              :  33   Not at all  : 113   65+   : 977  
 NA's                 :2132   Refused     :   9   DK/REF:   2  
                              NA's        :1456                
                                                               
                 F_EDUCCAT              F_PARTYSUM_FINAL
 College graduate+    :1600   Rep/Lean Rep      :1823   
 Some College         :1182   Dem/Lean Dem      :2296   
 H.S. graduate or less:1483   DK/Refused/No lean: 153   
 Don't know/Refused   :   7                             
                                                        
                                                        
                                                        
               F_IDEO    
 Very conservative: 353  
 Conservative     : 977  
 Moderate         :1615  
 Liberal          : 828  
 Very liberal     : 386  
 Refused          : 113  
                         

*High NA value indicates that the question was not presented to all respondents

The data set is now primed for examining correlations and testing hypotheses.

References

Barth, S., de Jong, M. D. T., Junger, M., Hartel, P. H. & Roppelt, J. C. (2019). Putting the privacy paradox to the test: Online privacy and security behaviors among users with technical knowledge, privacy awareness, and financial resources. Telematics and Informatics, 41, 55–69. doi:10.1016/j.tele.2019.03.003

Boerman, S. C., Kruikemeier, S., & Zuiderveen Borgesius, F. J. (2021). Exploring Motivations for Online Privacy Protection Behavior: Insights From Panel Data. Communication Research, 48(7), 953–977. https://doi.org/10.1177/0093650218800915

Pew Research Center. (2019). Americans and privacy: Concerned, confused and feeling lack of control over their personal information. https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and- feeling-lack-of-control-over-their-personal-information/

Pew Research Center. (2020). Wave 49 American trends panel [Data set]. https://www.pewresearch.org/internet/dataset/american-trends-panel-wave-49/

Turow, J., Hennessy, M. & Draper, N. (2015). The tradeoff fallacy – How marketers are misrepresenting American consumers and opening them up to exploitation. Annenberg School for Communication.

Zuboff, S. (2015). Big other: Surveillance capitalism and the prospects of an information civilization. Journal of Information Technology, 30(1), 75–89. doi:10.1057/jit.2015.5