Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Roy Yoon
August 17, 2022
Today’s challenge is to:
pivot_longer
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows
[1] "month" "year" "large_half_dozen"
[4] "large_dozen" "extra_large_half_dozen" "extra_large_dozen"
[1] 120 6
The eggs data set has 120 rows and 6 columns.
The data examines the monthly price of different egg sizes and amounts from 2004 to 2013.
Lets see if this works with a simple example.
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows
[1] 120
[1] 6
[1] 480
[1] 5
# A tibble: 480 × 5
month year size amount price
<chr> <dbl> <chr> <chr> <dbl>
1 January 2004 large halfdozen 126
2 January 2004 large dozen 230
3 January 2004 xlarge halfdozen 132
4 January 2004 xlarge dozen 230
5 February 2004 large halfdozen 128.
6 February 2004 large dozen 226.
7 February 2004 xlarge halfdozen 134.
8 February 2004 xlarge dozen 230
9 March 2004 large halfdozen 131
10 March 2004 large dozen 225
# … with 470 more rows
# ℹ Use `print(n = ...)` to see more rows
New variable the size and amount are now separated and independent variables of each other and price is also aviable as its own variable
[1] 480 5
[1] "month" "year" "size" "amount" "price"
The eggs_tidy_longer data set contains 480 rows and 5 columns.
The columns distinguish month, year, size, amount, and price as variable.
The variables organized in this manner allows us to better understand and analyze how the variable are indepent from each other when looking at, size, amount, and price.
---
title: "Challenge 3 Instructions"
author: "Roy Yoon"
desription: "Tidy Data: Pivoting"
date: "08/17/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs_tidy.csv
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
```{r}
eggs<-read_csv("_data/eggs_tidy.csv")
eggs
```
### About eggs_tidy.csv data set
```{r}
eggs
colnames(eggs)
dim(eggs)
```
The eggs data set has 120 rows and 6 columns.
The data examines the monthly price of different egg sizes and amounts from 2004 to 2013.
## Anticipate the End Result
### Example: find current and future data dimensions
Lets see if this works with a simple example.
```{r}
#| tbl-cap: Example
df<-eggs
df
#existing rows/cases
nrow(df)
#existing columns/cases
ncol(df)
#expected rows/cases
nrow(df) * (ncol(df)-2)
# expected columns
5
```
## eggs data set pivot longer (month, year, size, amount, price)
```{r}
eggs <- rename(eggs, large_halfdozen = large_half_dozen, xlarge_halfdozen = extra_large_half_dozen, xlarge_dozen = extra_large_dozen)
eggs_tidy_longer<- eggs%>%
pivot_longer(cols=contains("large"),
names_to = c("size", "amount"),
names_sep="_",
values_to = "price"
)
eggs_tidy_longer
```
New variable the size and amount are now separated and independent variables of each other and price is also aviable as its own variable
## Describing Pivoted Data
```{r}
dim(eggs_tidy_longer)
colnames(eggs_tidy_longer)
```
The eggs_tidy_longer data set contains 480 rows and 5 columns.
The columns distinguish month, year, size, amount, and price as variable.
The variables organized in this manner allows us to better understand and analyze how the variable are indepent from each other when looking at, size, amount, and price.