Blog
Stigma
Sports
Nayan
Getting my data
Author

Nayan Jani

Published

October 12, 2022

Code
library(tidyverse)
library(rtweet)
library(quanteda)


knitr::opts_chunk$set(echo = TRUE)

Literature Review

The first article I looked at talked about racial bias in Officials from the Italian Serie A. The goal of this study was to see if the trained Officials are subject to bias against Black and dark-skinned players and penalize them more than other players. The data contains information for each player in the Serie A from the 2009/10 season to the 2020/21 season. The study used three versions of the Football Manager videogame (Football Manager, 2011, 2018, 2021) to collect data on player skin tones, This skin tone variable is a continuous variable that ranges from 1, lightest skin tone, to 20, darkest skin tone. For red and yellow cards, the study used data from Footystats (2021) and data for fouls were available from WhoScored (2021) and from FBREF (2021). The main hypothesis of the study is that bias against darker-skinned players has likely resulted in unfair patterns of refereeing, including the distribution of a greater number of foul calls, yellow cards, and ejections (red cards). The methods usedin this study were OLS and Poisson Regression. The study found that skin tone does affect referee decisions, especially with respect to fouls committed and yellow cards, and more weakly with respect to red cards. Overall, I found this study interesting because it is looking into racial bias that actually effects the game. This shows that the racial stigmas are still a problem in sports and are effecting the integrity of the game.

The Second article I read discussed racial bias in National Football League officiating. The goal of this study was to examine potential racial bias regarding holding penalties in the National Football League (NFL). The conatains info from the 2013 to 2014 through 2015 to 2016 NFL seasons that includes the races of officials and players involved in holding penalties. The two types of analysis are used to determine racial bias, player-level analysis and a game-level analysis. The outcome of the player analysis is a dichotomous variable where it indicates a any combination of a white/black official calls a penalty on a white/black player. The dependent variable in the game-level analysis is the percentage of holding penalties called on Black players per game. The player-level analysis uses multinomial linear regression and the game-level analysis uses linear regression. The results showed no evidence of racial bias in the calling of holding penalties by White officials and Black players were more likely to have holding penalties called on them earlier in the game by all officials. Overall I found this article intersting because there is a lot of grey areas when calling a holding call and it is cool to see if racial bias has any effect on this type of call. If the study was able to determine a stronger relationship between racial bias and holding calls, it could lead to a more fair game and can remove a lot of bad calls.

My Project Idea

The topic I want to look into is Sports Fans. I want to find out what groups of sports fans are more socially correct than others. What I mean by socially correct is that these groups of fans do not have any prejudice or enforce stigmas towards other groups of people. The groups of fans I would like to analyze are Soccer, NFL, NBA and UFC fans. To analyze this groups of fans, I will look into their textual responses of certain topics. For soccer fans I will look at their discussion about including LGBTQ in this years world cup in Qatar. For UFC I will look into the responses of fans to including certain fighters in their Hispanic heritage montage. For NFL. I will look at the responses of fans to the Deshaun Watson vs Calvin Ridley punishments. For NBA, I will look at the fans responses to the Ime Udoka vs Robert Sarver punishments. The data I will use will come from Youtube API. Most of the these fan discussions come from comments on Youtube and I believe analyzing the language they use will determine if certain groups of fans can be more socially correct.

Code
df_q<- read_csv("_data/comments_q.csv")
Warning: One or more parsing issues, see `problems()` for details
Rows: 99 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): I’ll try to get the next video essay out in less than a month lol

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
df_q<- df_q %>% 
  rename(text = "I’ll try to get the next video essay out in less than a month lol")


corpus_q <- corpus(df_q)

corpusQ_sum <- summary(corpus_q)
corpusQ_sum
Corpus consisting of 99 documents, showing 99 documents:

   Text Types Tokens Sentences
  text1    11     15         1
  text2    16     17         3
  text3    46     51         1
  text4    13     13         1
  text5    86    125         7
  text6     3      3         1
  text7    48     55         3
  text8    18     20         1
  text9    28     32         2
 text10    20     27         2
 text11    19     19         1
 text12    18     18         1
 text13     8      8         1
 text14    18     25         1
 text15     2      2         1
 text16    48     59         4
 text17    23     24         2
 text18    73     97         2
 text19    85    150         6
 text20    51     69         3
 text21    22     25         2
 text22    26     28         4
 text23    12     12         1
 text24    23     26         1
 text25    17     24         1
 text26    34     40         2
 text27    80    124         6
 text28    14     14         2
 text29    70     85         3
 text30    14     14         2
 text31    42     59         1
 text32    51     70         2
 text33    12     16         1
 text34     6      6         1
 text35     9     11         2
 text36    12     12         1
 text37    23     23         1
 text38    26     32         1
 text39     3      3         1
 text40     5      5         1
 text41   114    222         7
 text42    20     21         2
 text43    22     27         1
 text44    29     33         2
 text45     6      6         1
 text46    22     25         4
 text47    24     26         3
 text48    81    109         1
 text49    16     21         2
 text50    16     31         3
 text51    15     15         1
 text52    26     32         1
 text53    34     39         2
 text54     9     11         1
 text55    12     12         1
 text56     6      6         1
 text57    12     12         1
 text58     2      2         1
 text59    20     22         1
 text60    54     77         2
 text61    26     29         3
 text62     7      7         1
 text63    19     19         2
 text64     4      6         1
 text65    19     22         2
 text66    58     77         3
 text67     4      4         1
 text68    17     24         1
 text69    42     53         3
 text70    15     19         1
 text71    66     84         5
 text72     1      1         1
 text73    25     30         1
 text74    17     17         1
 text75    45     63         1
 text76    11     11         1
 text77    22     35         1
 text78    46     64         4
 text79     9      9         1
 text80    23     28         3
 text81    11     14         1
 text82    51     59         2
 text83    12     14         1
 text84     7      7         1
 text85    22     25         2
 text86    85    125         2
 text87    27     54         4
 text88     9      9         1
 text89    22     27         2
 text90    33     41         1
 text91    15     15         1
 text92     8      8         1
 text93   128    180         4
 text94     6      6         1
 text95     5      5         1
 text96    71     94         3
 text97    21     28         1
 text98    98    179        10
 text99    11     12         1
Code
df_nba <- read_csv("_data/comments_nba.csv")
Warning: One or more parsing issues, see `problems()` for details
Rows: 98 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Thoughts on Malika and Stephen A having a disagreement?

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
df_nba<- df_nba %>%
  rename(text = "Thoughts on Malika and Stephen A having a disagreement?")


corpus_nba <- corpus(df_nba)

corpusNBA_sum <- summary(corpus_nba)
corpusNBA_sum
Corpus consisting of 98 documents, showing 98 documents:

   Text Types Tokens Sentences
  text1    97    178         6
  text2    64     99         2
  text3    33     37         1
  text4    22     26         2
  text5     7      7         1
  text6    60     67         4
  text7    14     14         2
  text8     7      7         1
  text9     7     10         2
 text10    10     13         3
 text11    27     27         1
 text12    39     47         4
 text13    23     27         3
 text14    15     16         1
 text15     5      6         1
 text16    11     11         1
 text17    45     55         2
 text18    26     31         4
 text19    19     24         1
 text20     7      7         1
 text21    54     87         2
 text22    19     27         1
 text23    39     47         1
 text24    25     27         1
 text25    31     37         1
 text26    66     93         1
 text27     5      5         1
 text28    10     10         1
 text29     9     16         2
 text30    29     29         1
 text31    16     18         1
 text32     7      7         1
 text33    25     32         1
 text34     3      3         1
 text35    29     39         2
 text36    10     15         2
 text37     5      5         1
 text38    19     19         3
 text39   104    158         7
 text40     1      1         1
 text41    27     32         3
 text42    18     21         2
 text43    26     34         1
 text44     8      8         1
 text45     3      3         1
 text46    13     18         3
 text47    11     11         2
 text48     4      4         1
 text49    39     51         2
 text50    23     26         2
 text51    26     33         5
 text52     6      6         1
 text53    16     16         2
 text54    60     80         4
 text55    19     22         4
 text56    11     13         1
 text57    11     16         2
 text58    42     64         4
 text59    14     19         2
 text60    52     67         8
 text61    20     21         2
 text62     5      5         1
 text63    82    125         8
 text64    16     16         2
 text65    21     25         3
 text66    30     36         5
 text67    23     25         1
 text68    20     23         1
 text69    22     27         1
 text70    31     40         4
 text71    64     94         2
 text72    22     31         3
 text73    35     42         2
 text74     7      7         1
 text75     5      5         1
 text76     8      8         1
 text77    10     10         1
 text78    42     52         4
 text79    14     14         1
 text80    32     33         2
 text81     5      5         1
 text82     3      3         1
 text83    18     19         1
 text84     8      8         1
 text85    37     45         4
 text86    35     41         1
 text87    13     14         1
 text88    38     48         4
 text89    39     48         8
 text90    12     13         1
 text91     7      9         1
 text92    22     27         4
 text93     8     12         2
 text94    19     19         2
 text95    22     23         1
 text96    16     17         1
 text97    24     27         1
 text98    13     15         2
Code
df_nfl <- read_csv("_data/comments_nfl.csv")
Warning: One or more parsing issues, see `problems()` for details
Rows: 99 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): What crime did he commit?

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
df_nfl<- df_nfl %>%
  rename(text = "What crime did he commit?")


corpus_nfl <- corpus(df_nfl)

corpusNFL_sum <- summary(corpus_nfl)
corpusNFL_sum
Corpus consisting of 99 documents, showing 99 documents:

   Text Types Tokens Sentences
  text1     4      4         1
  text2    66     83         7
  text3    44     52         4
  text4     8      8         1
  text5    24     26         3
  text6     7      7         1
  text7    21     23         1
  text8    39     43         2
  text9     8      8         1
 text10    36     41         3
 text11    26     28         2
 text12    19     20         1
 text13     8      8         1
 text14    23     33         2
 text15    29     35         5
 text16    32     46         3
 text17    26     30         3
 text18    17     22         1
 text19     5      5         1
 text20     3      3         1
 text21    21     29         1
 text22    24     26         2
 text23    21     23         3
 text24    41     56         4
 text25    14     14         1
 text26     7      7         1
 text27    20     20         1
 text28     9      9         2
 text29     8      9         1
 text30     6      7         2
 text31    31     43         2
 text32    29     35         2
 text33    20     24         1
 text34    12     12         1
 text35     8      8         2
 text36    15     15         1
 text37    20     21         2
 text38    33     37         2
 text39    10     10         1
 text40    20     22         1
 text41    10     10         1
 text42    39     47         1
 text43    15     15         1
 text44    15     20         1
 text45    65     82         2
 text46    19     21         3
 text47    12     12         2
 text48    13     15         1
 text49    10     10         1
 text50     7      7         1
 text51    24     26         1
 text52     4      4         1
 text53    23     27         2
 text54    20     21         2
 text55    19     21         3
 text56    12     12         1
 text57    73     92         5
 text58   117    219        17
 text59    12     15         2
 text60    30     36         3
 text61    57     73         6
 text62     8      8         1
 text63    26     30         3
 text64     1      1         1
 text65    19     20         2
 text66    32     37         4
 text67     7      7         1
 text68    15     15         1
 text69    60     82         1
 text70    40     49         7
 text71     4      4         1
 text72    11     12         1
 text73    91    125         7
 text74     9      9         1
 text75    13     13         1
 text76    31     39         2
 text77    18     19         1
 text78     7      7         1
 text79     6      9         1
 text80    13     14         2
 text81    44     56         5
 text82    19     19         1
 text83     9      9         1
 text84    25     42         2
 text85    22     26         3
 text86    18     21         2
 text87    37     43         1
 text88     7      7         1
 text89    22     22         1
 text90    53     67         3
 text91     9      9         1
 text92    57     73         4
 text93    80    138         9
 text94    43     63         3
 text95    21     25         1
 text96     4      4         1
 text97    23     25         2
 text98     9      9         1
 text99     9      9         1
Code
corpus_nba_tokens <- tokens(corpus_nba)

corpus_nba_tokens <-  tokens(corpus_nba, 
    remove_punct = T,
    remove_numbers = T,
    remove_symbols =T)

print(corpus_nba_tokens)
Tokens consisting of 98 documents.
text1 :
 [1] "I"        "can"      "#39"      "t"        "believe"  "this"    
 [7] "is"       "actually" "a"        "debate"   "in"       "America" 
[ ... and 126 more ]

text2 :
 [1] "She"    "acts"   "like"   "she"    "is"     "owed"   "stuff"  "br"    
 [9] "info"   "that's" "none"   "of"    
[ ... and 63 more ]

text3 :
 [1] "My"       "question" "is"       "why"      "do"       "they"    
 [7] "allow"    "people"   "like"     "her"      "to"       "be"      
[ ... and 23 more ]

text4 :
 [1] "That"    "is"      "NOT"     "why"     "we"      "are"     "here"   
 [8] "I'm"     "rolling" "The"     "way"     "she"    
[ ... and 3 more ]

text5 :
[1] "Who"   "did"   "this"  "man"   "sleep" "with" 

text6 :
 [1] "I"         "#39"       "m"         "with"      "Candace"   "Owens"    
 [7] "Basicslly" "shouldnt"  "even"      "hire"      "women"     "because"  
[ ... and 48 more ]

[ reached max_ndoc ... 92 more documents ]