Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Matt Zambetti
May 30, 2023
The data set I chose is:
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>
The data, as seen above is a list of the sales of large half dozen, large dozen, extra large half dozen, and extra large dozen carton of eggs for every month from January 2004 through December 2013.
In pivoting the data, I am looking to ‘pivot longer’ so that our data is listed by the month, year and the product instead of month and year. This way, we can do the ‘group by’ function to more easily compare the sales of the the same egg carton size over time.
Here we look into the dimensions of the data and predict the output shape.
Document your work here.
[1] 6
[1] 120
[1] 480
Looking at the current dimensions of my data we know there are 120 rows already (12 months over 10 years) and the data we’re separating by 6 variables to start. However, we want to keep the month and year so we will expand the number of rows 4 times for each month and year combination. Leaving us with the expected size of 480 rows and 4 columns (month, year, product, and count). The columns with go to a size of four because we will add in the count variable for each month, year, and product combination as well as the month, year, and product variable.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 480 × 4
month year product count
<chr> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 February 2004 large_half_dozen 128.
6 February 2004 large_dozen 226.
7 February 2004 extra_large_half_dozen 134.
8 February 2004 extra_large_dozen 230
9 March 2004 large_half_dozen 131
10 March 2004 large_dozen 225
# ℹ 470 more rows
Here we can see that my predictions were correct, in that the expected number of rows was 480 which we received and the number of columns was 4 as also predicted.
---
title: "Challenge 3 Submission"
author: "Matt Zambetti"
description: "Tidy Data: Pivoting"
date: "5/30/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in data
The data set I chose is:
- eggs_tidy.csv ⭐⭐
```{r}
eggs_table <- read_csv("_data/eggs_tidy.csv")
eggs_table
```
### Briefly describe the data
The data, as seen above is a list of the sales of large half dozen, large dozen, extra large half dozen, and extra large dozen carton of eggs for every month from January 2004 through December 2013.
In pivoting the data, I am looking to 'pivot longer' so that our data is listed by the month, year and the product instead of month and year. This way, we can do the 'group by' function to more easily compare the sales of the the same egg carton size over time.
## Anticipate the End Result
Here we look into the dimensions of the data and predict the output shape.
### Challenge: Describe the final dimensions
Document your work here.
```{r}
ncol(eggs_table)
nrow(eggs_table)
(ncol(eggs_table)-2)*(nrow(eggs_table))
```
Looking at the current dimensions of my data we know there are 120 rows already (12 months over 10 years) and the data we're separating by 6 variables to start. However, we want to keep the month and year so we will expand the number of rows 4 times for each month and year combination. Leaving us with the expected size of 480 rows and 4 columns (month, year, product, and count). The columns with go to a size of four because we will add in the count variable for each month, year, and product combination as well as the month, year, and product variable.
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Challenge: Pivot the Chosen Data
```{r}
eggs_table %>%
pivot_longer(
cols= -c(year, month),
names_to = "product",
values_to = "count")
```
Here we can see that my predictions were correct, in that the expected number of rows was 480 which we received and the number of columns was 4 as also predicted.