Challenge 3

challenge_3

eggs

Tidy Data: Pivoting

Author

Noah Dixon

Published

June 8, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in the Data

Code

eggs_from_csv <- read_csv("_data/eggs_tidy.csv")

Describe the Data

In order to get a better sense of what the data looks like, we can print the first 6 rows of the data using the head function.

Code

head(eggs_from_csv)

It appears this data set is showing the number of eggs sold for different egg types during each month for specific years. Lets see how many years there are in the data set.

Code

print(unique(eggs_from_csv$year))

 [1] 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

We have confirmed that the data set shows data for multiple years. For this data, it is clear that a case should be represented by a month, year, and container size (large_half_dozen, large_dozen, extra_large_half_dozen, and extra_large_dozen). Therefore, we need to pivot the data to remove the large_half_dozen, large_dozen, extra_large_half_dozen, and extra_large_dozen columns and add a container size column.

Describe the Final Dimensions

The output below shows the dimensions of the current data fram in terms of number of rows and columns, as well as the expected dimensions of the new dataframe after pivoting.

Code

#existing rows/cases
nrow(eggs_from_csv)

[1] 120

Code

#existing columns/cases
ncol(eggs_from_csv)

[1] 6

Code

#expected rows/cases
nrow(eggs_from_csv) * 4

[1] 480

Code

# expected columns 
ncol(eggs_from_csv) - 4 + 2

[1] 4

Pivot the Data

Again, after pivoting a case should be represent a month, year, and container size.

Code

eggs_tidy <- pivot_longer(eggs_from_csv, col = large_half_dozen:extra_large_dozen, names_to = "container", values_to = "quantity")

eggs_tidy

As we can see, the pivoted data is the expected size (480 x 4).