Code
library(tidyverse)
::opts_chunk$set(echo = TRUE) knitr
Pradhakshya Dhanakumar
February 26, 2023
Read the data from a .csv file
Rows: 46 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): sid, year, first_epid, last_epid, n_episodes
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 46 × 5
sid year first_epid last_epid n_episodes
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1975 19751011 19760731 24
2 2 1976 19760918 19770521 22
3 3 1977 19770924 19780520 20
4 4 1978 19781007 19790526 20
5 5 1979 19791013 19800524 20
6 6 1980 19801115 19810411 13
7 7 1981 19811003 19820522 20
8 8 1982 19820925 19830514 20
9 9 1983 19831008 19840512 19
10 10 1984 19841006 19850413 17
# ℹ 36 more rows
Top 10 rows of the data
# A tibble: 10 × 5
sid year first_epid last_epid n_episodes
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1975 19751011 19760731 24
2 2 1976 19760918 19770521 22
3 3 1977 19770924 19780520 20
4 4 1978 19781007 19790526 20
5 5 1979 19791013 19800524 20
6 6 1980 19801115 19810411 13
7 7 1981 19811003 19820522 20
8 8 1982 19820925 19830514 20
9 9 1983 19831008 19840512 19
10 10 1984 19841006 19850413 17
Bottom 2 rows of data
# A tibble: 2 × 5
sid year first_epid last_epid n_episodes
<dbl> <dbl> <dbl> <dbl> <dbl>
1 45 2019 20190928 20200509 18
2 46 2020 20201003 20210410 17
Selecting specific columns from data and displaying top 6 rows
Data type of each column
sapply(data, class)
sid numeric
year numeric
first_epid numeric
last_epid numeric
n_episodes numeric
Dimension of data
Printing column names
Total rows in dataset
Total columns in dataset
sid year first_epid last_epid
Min. : 1.00 Min. :1975 Min. :19751011 Min. :19760731
1st Qu.:12.25 1st Qu.:1986 1st Qu.:19863512 1st Qu.:19872949
Median :23.50 Median :1998 Median :19975926 Median :19985512
Mean :23.50 Mean :1998 Mean :19975965 Mean :19985509
3rd Qu.:34.75 3rd Qu.:2009 3rd Qu.:20088423 3rd Qu.:20098015
Max. :46.00 Max. :2020 Max. :20201003 Max. :20210410
n_episodes
Min. :12.0
1st Qu.:20.0
Median :20.0
Mean :19.7
3rd Qu.:21.0
Max. :24.0
The dataset used has 46 rows and 5 columns . Each row has information on which year the season was released along with when the first and last episodes were released and the total number of episodes in that particular season. All of the columns have the data type as ‘numeric’. We can see the summarized info for each column using the summary() command. Most of the seasons have an average of 20 episodes.
---
title: "Challenge 1"
author: "Pradhakshya Dhanakumar"
desription: "Worked with snl_seasons data"
date: "02/26/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- Challenge 1
- Pradhakshya Dhanakumar
- SNL_SEASONS DATA
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```
## Reading Data
Read the data from a .csv file
```{r}
data <- read_csv("_data/snl_seasons.csv")
print(data)
```
Top 10 rows of the data
```{r}
head(data,10)
```
Bottom 2 rows of data
```{r}
tail(data,2)
```
Selecting specific columns from data and displaying top 6 rows
```{r}
actors = select(data, "year", "n_episodes")
head(actors)
```
## Data Description
Data type of each column
```{r}
as.data.frame(sapply(data, class))
```
Dimension of data
```{r}
dim(data)
````
Printing column names
```{r}
colnames(data)
```
Total rows in dataset
```{r}
nrow(data)
```
Total columns in dataset
````{r}
ncol(data)
```
## Data Summary
```{r}
summary(data)
```
The dataset used has 46 rows and 5 columns . Each row has information on which year the season was released along with when the first and last episodes were released and the total number of episodes in that particular season. All of the columns have the data type as 'numeric'. We can see the summarized info for each column using the summary() command. Most of the seasons have an average of 20 episodes.