Challenge 1

Pradhakshya Dhanakumar

SNL_SEASONS DATA

Author

Pradhakshya Dhanakumar

Published

February 26, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Reading Data

Read the data from a .csv file

Code

data <- read_csv("_data/snl_seasons.csv")

Rows: 46 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): sid, year, first_epid, last_epid, n_episodes

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

print(data)

# A tibble: 46 × 5
     sid  year first_epid last_epid n_episodes
   <dbl> <dbl>      <dbl>     <dbl>      <dbl>
 1     1  1975   19751011  19760731         24
 2     2  1976   19760918  19770521         22
 3     3  1977   19770924  19780520         20
 4     4  1978   19781007  19790526         20
 5     5  1979   19791013  19800524         20
 6     6  1980   19801115  19810411         13
 7     7  1981   19811003  19820522         20
 8     8  1982   19820925  19830514         20
 9     9  1983   19831008  19840512         19
10    10  1984   19841006  19850413         17
# ℹ 36 more rows

Top 10 rows of the data

Code

head(data,10)

# A tibble: 10 × 5
     sid  year first_epid last_epid n_episodes
   <dbl> <dbl>      <dbl>     <dbl>      <dbl>
 1     1  1975   19751011  19760731         24
 2     2  1976   19760918  19770521         22
 3     3  1977   19770924  19780520         20
 4     4  1978   19781007  19790526         20
 5     5  1979   19791013  19800524         20
 6     6  1980   19801115  19810411         13
 7     7  1981   19811003  19820522         20
 8     8  1982   19820925  19830514         20
 9     9  1983   19831008  19840512         19
10    10  1984   19841006  19850413         17

Bottom 2 rows of data

Code

tail(data,2)

# A tibble: 2 × 5
    sid  year first_epid last_epid n_episodes
  <dbl> <dbl>      <dbl>     <dbl>      <dbl>
1    45  2019   20190928  20200509         18
2    46  2020   20201003  20210410         17

Selecting specific columns from data and displaying top 6 rows

Code

actors = select(data, "year", "n_episodes")
head(actors)

# A tibble: 6 × 2
   year n_episodes
  <dbl>      <dbl>
1  1975         24
2  1976         22
3  1977         20
4  1978         20
5  1979         20
6  1980         13

Data Description

Data type of each column

Code

as.data.frame(sapply(data, class))

           sapply(data, class)
sid                    numeric
year                   numeric
first_epid             numeric
last_epid              numeric
n_episodes             numeric

Dimension of data

Code

dim(data)

[1] 46  5

Printing column names

Code

colnames(data)

[1] "sid"        "year"       "first_epid" "last_epid"  "n_episodes"

Total rows in dataset

Code

nrow(data)

[1] 46

Total columns in dataset

Code

ncol(data)

[1] 5

Data Summary

Code

summary(data)

      sid             year        first_epid         last_epid       
 Min.   : 1.00   Min.   :1975   Min.   :19751011   Min.   :19760731  
 1st Qu.:12.25   1st Qu.:1986   1st Qu.:19863512   1st Qu.:19872949  
 Median :23.50   Median :1998   Median :19975926   Median :19985512  
 Mean   :23.50   Mean   :1998   Mean   :19975965   Mean   :19985509  
 3rd Qu.:34.75   3rd Qu.:2009   3rd Qu.:20088423   3rd Qu.:20098015  
 Max.   :46.00   Max.   :2020   Max.   :20201003   Max.   :20210410  
   n_episodes  
 Min.   :12.0  
 1st Qu.:20.0  
 Median :20.0  
 Mean   :19.7  
 3rd Qu.:21.0  
 Max.   :24.0

The dataset used has 46 rows and 5 columns . Each row has information on which year the season was released along with when the first and last episodes were released and the total number of episodes in that particular season. All of the columns have the data type as ‘numeric’. We can see the summarized info for each column using the summary() command. Most of the seasons have an average of 20 episodes.