Challenge 8: Michele Carlin

challenge_8
Michele Carlin
snl
Author

Michele Carlin

Published

May 3, 2023

Code
library(tidyverse)
library(summarytools)
library(dbplyr)
library(readr)


knitr::opts_chunk$set(echo = TRUE)

Read in and view summary of SNL_actors dataset.

Code
snl_actors <- read_csv ("_data/snl_actors.csv")
Rows: 2306 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): aid, url, type, gender

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
View(snl_actors)
view(dfSummary(snl_actors))
Switching method to 'browser'
Output file written: C:\Users\CarlinML\AppData\Local\Temp\RtmpGkYA9b\file52743747a45.html

Read in and view summary of SNL_casts dataset.

Code
snl_casts <- read_csv ("_data/snl_casts.csv")
Rows: 614 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): aid
dbl (5): sid, first_epid, last_epid, n_episodes, season_fraction
lgl (2): featured, update_anchor

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
View(snl_casts)
view(dfSummary(snl_casts))
Switching method to 'browser'
Output file written: C:\Users\CarlinML\AppData\Local\Temp\RtmpGkYA9b\file52743e911dc7.html

Read in and view summary of SNL_seasons dataset.

Code
snl_seasons <- read_csv ("_data/snl_seasons.csv")
Rows: 46 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): sid, year, first_epid, last_epid, n_episodes

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
View(snl_seasons)
view(dfSummary(snl_seasons))
Switching method to 'browser'
Output file written: C:\Users\CarlinML\AppData\Local\Temp\RtmpGkYA9b\file52746d185302.html

Summarize datasets… The SNL_actors dataset contains 2306 rows and 4 columns including the actor’s name (aid), a url to more information about that actor, the type of actor (cast, crew, guest, or unknown) and their gender.

The SNL_casts dataset contains 614 rows and 8 columns of data about cast members only (e.g., the cast member’s name (aid), season ID (sid), the date of their first and last episodes, etc.). In this dataset, every cast member has a separate row for each season they were on, as well as the number of episodes they were in for each of those seasons.

The SNL_seasons dataset contains 46 rows and 5 columns including season ID (sid), the year, dates of first and last episodes, and number of episodes in each season.

I will start with the SNL_casts file and merge in the SNL_actors file matching on actor ID (aid). The casts file only contains information on cast members, therefore if a did a left join, only cast member data would be retained. I am going to perform a full join so that I can retain data on both cast members and guests.

Code
casts_actors <- snl_casts %>% full_join(snl_actors, 
                              by=c('aid'))
View(casts_actors)
view(dfSummary(casts_actors))
Switching method to 'browser'
Output file written: C:\Users\CarlinML\AppData\Local\Temp\RtmpGkYA9b\file52747bc81f05.html

This new casts_actors file now contains 2764 rows and 11 columns.

I will now join the SNL_seasons file into my newly created casts_actors dataset; matching on season ID (sid).

Code
SNL <- casts_actors %>% left_join(snl_seasons, 
                              by=c('sid'))
View(SNL)
view(dfSummary(SNL))
Switching method to 'browser'
Output file written: C:\Users\CarlinML\AppData\Local\Temp\RtmpGkYA9b\file5274a0413cc.html

My new SNL dataset contains 2764 rows and 15 columns.