Removing url column from snl_actors as its not useful it is getting difficult to clean it.
snl_actors<- snl_actors %>%select(-url)
Join Data
actors_casts <- snl_actors %>%inner_join(snl_casts, by ="aid")fully_joined_dataset <- actors_casts %>%inner_join(snl_seasons, by ="sid")fully_joined_dataset
# A tibble: 614 × 14
aid type gender sid featu…¹ first…² last_…³ updat…⁴ n_epi…⁵ seaso…⁶
<chr> <chr> <chr> <dbl> <lgl> <dbl> <dbl> <lgl> <dbl> <dbl>
1 Kate McKi… cast female 37 TRUE 2.01e7 0 FALSE 5 0.227
2 Kate McKi… cast female 38 TRUE 0 0 FALSE 21 1
3 Kate McKi… cast female 39 FALSE 0 0 FALSE 21 1
4 Kate McKi… cast female 40 FALSE 0 0 FALSE 21 1
5 Kate McKi… cast female 41 FALSE 0 0 FALSE 21 1
6 Kate McKi… cast female 42 FALSE 0 0 FALSE 21 1
7 Kate McKi… cast female 43 FALSE 0 0 FALSE 21 1
8 Kate McKi… cast female 44 FALSE 0 0 FALSE 21 1
9 Kate McKi… cast female 45 FALSE 0 0 FALSE 18 1
10 Kate McKi… cast female 46 FALSE 0 0 FALSE 17 1
# … with 604 more rows, 4 more variables: year <dbl>, first_epid.y <dbl>,
# last_epid.y <dbl>, n_episodes.y <dbl>, and abbreviated variable names
# ¹featured, ²first_epid.x, ³last_epid.x, ⁴update_anchor, ⁵n_episodes.x,
# ⁶season_fraction
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Source Code
---title: "Challenge 8"author: "Adithya Parupudi"description: "Joining Data"date: "08/25/2022"format: html: toc: true code-copy: true code-tools: truecategories: - challenge_8 - Adithya Parupudi - snl data---```{r}#| label: setup#| warning: false#| message: falselibrary(tidyverse)library(ggplot2)knitr::opts_chunk$set(echo =TRUE, warning=FALSE, message=FALSE)```## Read in data```{r}snl_actors <-read_csv("_data/snl_actors.csv",show_col_types =FALSE)snl_casts <-read_csv("_data/snl_casts.csv",show_col_types =FALSE)snl_seasons <-read_csv("_data/snl_seasons.csv",show_col_types =FALSE)```### Briefly describe the data## Tidy Data (as needed)Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.```{r}print(summarytools::dfSummary(snl_actors, varnumbers =FALSE, plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE), method ='render', table.classes ='table-condensed')print(summarytools::dfSummary(snl_casts, varnumbers =FALSE, plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE), method ='render', table.classes ='table-condensed')print(summarytools::dfSummary(snl_seasons, varnumbers =FALSE, plain.ascii =FALSE, style ="grid", graph.magnif =0.70, valid.col =FALSE), method ='render', table.classes ='table-condensed')``````{r}colnames(snl_actors)colnames(snl_casts)colnames(snl_seasons)```Replacing NA in snl_casts with 0```{r}snl_casts<- snl_casts %>%replace_na(list(`first_epid`=0,`last_epid`=0))```Removing url column from snl_actors as its not useful it is getting difficult to clean it.```{r}snl_actors<- snl_actors %>%select(-url)```## Join Data```{r}actors_casts <- snl_actors %>%inner_join(snl_casts, by ="aid")fully_joined_dataset <- actors_casts %>%inner_join(snl_seasons, by ="sid")fully_joined_dataset```