Challenge 3 Submission

challenge_3
animal_weights
eggs
australian_marriage
usa_households
sce_labor
Tidy Data: Pivoting
Author

Matt Zambetti

Published

May 30, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

The data set I chose is:

  • eggs_tidy.csv ⭐⭐
Code
eggs_table <- read_csv("_data/eggs_tidy.csv")
eggs_table
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
 1 January    2004             126         230                    132 
 2 February   2004             128.        226.                   134.
 3 March      2004             131         225                    137 
 4 April      2004             131         225                    137 
 5 May        2004             131         225                    137 
 6 June       2004             134.        231.                   137 
 7 July       2004             134.        234.                   137 
 8 August     2004             134.        234.                   137 
 9 September  2004             130.        234.                   136.
10 October    2004             128.        234.                   136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>

Briefly describe the data

The data, as seen above is a list of the sales of large half dozen, large dozen, extra large half dozen, and extra large dozen carton of eggs for every month from January 2004 through December 2013.

In pivoting the data, I am looking to ‘pivot longer’ so that our data is listed by the month, year and the product instead of month and year. This way, we can do the ‘group by’ function to more easily compare the sales of the the same egg carton size over time.

Anticipate the End Result

Here we look into the dimensions of the data and predict the output shape.

Challenge: Describe the final dimensions

Document your work here.

Code
ncol(eggs_table)
[1] 6
Code
nrow(eggs_table)
[1] 120
Code
(ncol(eggs_table)-2)*(nrow(eggs_table))
[1] 480

Looking at the current dimensions of my data we know there are 120 rows already (12 months over 10 years) and the data we’re separating by 6 variables to start. However, we want to keep the month and year so we will expand the number of rows 4 times for each month and year combination. Leaving us with the expected size of 480 rows and 4 columns (month, year, product, and count). The columns with go to a size of four because we will add in the count variable for each month, year, and product combination as well as the month, year, and product variable.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Challenge: Pivot the Chosen Data

Code
eggs_table %>%
  pivot_longer(
    cols= -c(year, month),
    names_to = "product",
    values_to = "count")
# A tibble: 480 × 4
   month     year product                count
   <chr>    <dbl> <chr>                  <dbl>
 1 January   2004 large_half_dozen        126 
 2 January   2004 large_dozen             230 
 3 January   2004 extra_large_half_dozen  132 
 4 January   2004 extra_large_dozen       230 
 5 February  2004 large_half_dozen        128.
 6 February  2004 large_dozen             226.
 7 February  2004 extra_large_half_dozen  134.
 8 February  2004 extra_large_dozen       230 
 9 March     2004 large_half_dozen        131 
10 March     2004 large_dozen             225 
# ℹ 470 more rows

Here we can see that my predictions were correct, in that the expected number of rows was 480 which we received and the number of columns was 4 as also predicted.