Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Jack Sniezek
December 1, 2022
Today’s challenge is to:
pivot_longer
# A tibble: 120 × 6
month year large_halfdozen large_dozen xlarge_halfdozen xlarge_dozen
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows
month year large_halfdozen large_dozen
Length:120 Min. :2004 Min. :126.0 Min. :225.0
Class :character 1st Qu.:2006 1st Qu.:129.4 1st Qu.:233.5
Mode :character Median :2008 Median :174.5 Median :267.5
Mean :2008 Mean :155.2 Mean :254.2
3rd Qu.:2011 3rd Qu.:174.5 3rd Qu.:268.0
Max. :2013 Max. :178.0 Max. :277.5
xlarge_halfdozen xlarge_dozen
Min. :132.0 Min. :230.0
1st Qu.:135.8 1st Qu.:241.5
Median :185.5 Median :285.5
Mean :164.2 Mean :266.8
3rd Qu.:185.5 3rd Qu.:285.5
Max. :188.1 Max. :290.0
After reading in the eggs dataset, I can see that there are 120 rows that contain each month from 2004-2013. There are 6 columns that represent the month and year, as well as average egg prices for 4 types/quantities of eggs.
On the read in, I also renamed the columns to keep the size and quantity of eggs separate, which will help me pivot the data.
Right now the data consists of 6 columns, 4 of which contain values and 2 categorize the data. To make the data easier to work with, I want to make one column with values(Price) and add a column for size and quantity of eggs. So, my new matrix will contain the month, year, size, quantity, and price. I also anticipate that there will be 480 rows, as I will be putting all the price values into one column (120 months x 4 price variables).
# A tibble: 480 × 5
month year size quantity price
<chr> <dbl> <chr> <chr> <dbl>
1 January 2004 large halfdozen 126
2 January 2004 large dozen 230
3 January 2004 xlarge halfdozen 132
4 January 2004 xlarge dozen 230
5 February 2004 large halfdozen 128.
6 February 2004 large dozen 226.
7 February 2004 xlarge halfdozen 134.
8 February 2004 xlarge dozen 230
9 March 2004 large halfdozen 131
10 March 2004 large dozen 225
# … with 470 more rows
The data matches my prediction, as I now have 480 rows and 5 columns. The data is now organized so that there is one column that contains all the price values.
---
title: "Challenge 3"
author: "Jack Sniezek"
desription: "Tidy Data: Pivoting"
date: "12/1/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
- eggs_tidy.csv ⭐⭐
```{r}
eggs <- read_csv("_data/eggs_tidy.csv")%>%
rename("xlarge_halfdozen" = "extra_large_half_dozen",
"xlarge_dozen" = "extra_large_dozen",
"large_halfdozen" = "large_half_dozen")
eggs
summary(eggs)
```
## Briefly describe the data
After reading in the eggs dataset, I can see that there are 120 rows that contain each month from 2004-2013. There are 6 columns that represent the month and year, as well as average egg prices for 4 types/quantities of eggs.
On the read in, I also renamed the columns to keep the size and quantity of eggs separate, which will help me pivot the data.
## Anticipate the End Result
Right now the data consists of 6 columns, 4 of which contain values and 2 categorize the data. To make the data easier to work with, I want to make one column with values(Price) and add a column for size and quantity of eggs. So, my new matrix will contain the month, year, size, quantity, and price. I also anticipate that there will be 480 rows, as I will be putting all the price values into one column (120 months x 4 price variables).
## Pivot the Data
```{r}
eggs_longer <- eggs %>%
pivot_longer(cols = contains("large"),
names_to = c("size", "quantity"),
names_sep = "_",
values_to = "price")
eggs_longer
```
The data matches my prediction, as I now have 480 rows and 5 columns. The data is now organized so that there is one column that contains all the price values.