HW 2

Reading in Data
Author

Henry Mitrano

Published

January 21, 2023

 #| label: setup
 #| warning: false
 #| message: false
 library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
 library(ggplot2)
 knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

 data <- read_csv("_data/snl_actors.csv")
 view(data)
 head(data)
aid url type gender
Kate McKinnon /Cast/?KaMc cast female
Alex Moffat /Cast/?AlMo cast male
Ego Nwodim /Cast/?EgNw cast unknown
Chris Redd /Cast/?ChRe cast male
Kenan Thompson /Cast/?KeTh cast male
Carey Mulligan /Guests/?3677 guest andy
 summary(data)
     aid                url                type              gender         
 Length:2306        Length:2306        Length:2306        Length:2306       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  

Narrative, Variables, and Research Question

It appears that this is a dataset consisting of every performer that has ever appeared on the hit late-night sketch comedy television show ‘Saturday Night Live’. It includes the performer’s name, their gender, whether they are a cast member or guest performer, and what seems to be a url to their performer page/site on the web. Perhaps there is an online database where this information is stored, and the CSV provides a link to the individual pages.

There are a few places we could take this- for one, the data might be able to use some tidying/organization in the gender column. Some performers are male and female, some are listed as “unknown”, and Carey Mulligan is listed as “andy”. Now, gender is fluid and constitutes a spectrum, so I wouldn’t necessarily adjust the variable, because gender being “unknown” could mean any number of things, in 2023 at least. So I would want to look into that, as well as “andy” and maybe make that category more structured, or change the name of the category to “sex” not “gender” so that the category would be binary, for the sake of ease.