Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Shreya Varma
May 30, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>
Rows: 120
Columns: 6
$ month <chr> "January", "February", "March", "April", "May",…
$ year <dbl> 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004,…
$ large_half_dozen <dbl> 126.00, 128.50, 131.00, 131.00, 131.00, 133.50,…
$ large_dozen <dbl> 230.000, 226.250, 225.000, 225.000, 225.000, 23…
$ extra_large_half_dozen <dbl> 132.000, 134.500, 137.000, 137.000, 137.000, 13…
$ extra_large_dozen <dbl> 230.0, 230.0, 230.0, 234.5, 236.0, 241.0, 241.0…
month year large_half_dozen large_dozen
Length:120 Min. :2004 Min. :126.0 Min. :225.0
Class :character 1st Qu.:2006 1st Qu.:129.4 1st Qu.:233.5
Mode :character Median :2008 Median :174.5 Median :267.5
Mean :2008 Mean :155.2 Mean :254.2
3rd Qu.:2011 3rd Qu.:174.5 3rd Qu.:268.0
Max. :2013 Max. :178.0 Max. :277.5
extra_large_half_dozen extra_large_dozen
Min. :132.0 Min. :230.0
1st Qu.:135.8 1st Qu.:241.5
Median :185.5 Median :285.5
Mean :164.2 Mean :266.8
3rd Qu.:185.5 3rd Qu.:285.5
Max. :188.1 Max. :290.0
Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”
This dataset contains the data of type of eggs sold every month from 2004 to 2013. We can use pivot longer to reduce the number of columns and increase the number of rows. Thus the type of eggs can turn into columns and the egg value can become one column. Thus will result in 4 columns of dataset only and tidy it.
There are 120 rows and 6 columns in this dataset. Out of these 4 columns represent type of eggs which can become one column and its value can go into value column. Thus we should expect 120*4 = 480 rows in the pivoted dataframe.
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>
[1] 120
[1] 6
[1] 480
[1] 4
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 480 × 4
month year eggs_type number
<chr> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 February 2004 large_half_dozen 128.
6 February 2004 large_dozen 226.
7 February 2004 extra_large_half_dozen 134.
8 February 2004 extra_large_dozen 230
9 March 2004 large_half_dozen 131
10 March 2004 large_dozen 225
# ℹ 470 more rows
Thus we have pivoted the data to get 480 rows.
---
title: "Challenge 3 Solution"
author: "Shreya Varma"
description: "Tidy Data: Pivoting"
date: "5/30/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
eggs_tidy <- read_csv("_data/eggs_tidy.csv")
eggs_tidy
glimpse(eggs_tidy)
summary(eggs_tidy)
```
### Briefly describe the data
Describe the data, and be sure to comment on why you are planning to pivot it to make it "tidy"
This dataset contains the data of type of eggs sold every month from 2004 to 2013. We can use pivot longer to reduce the number of columns and increase the number of rows. Thus the type of eggs can turn into columns and the egg value can become one column. Thus will result in 4 columns of dataset only and tidy it.
## Anticipate the End Result
There are 120 rows and 6 columns in this dataset. Out of these 4 columns represent type of eggs which can become one column and its value can go into value column. Thus we should expect 120*4 = 480 rows in the pivoted dataframe.
### Challenge: Describe the final dimensions
```{r}
df<-eggs_tidy
df
#existing rows/cases
nrow(df)
#existing columns/cases
ncol(df)
#expected rows/cases
nrow(df) * 4
# expected columns
ncol(df) -4 + 2
```
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Challenge: Pivot the Chosen Data
```{r}
eggs_tidy_pivoted <- pivot_longer(eggs_tidy, col = large_half_dozen:extra_large_dozen, names_to = "eggs_type", values_to = "number")
eggs_tidy_pivoted
```
Thus we have pivoted the data to get 480 rows.