Challenge 3: Pivoting Egg Data

challenge_3
eggs
Saksham Kumar
Pivoting Egg Data
Author

Saksham Kumar

Published

April 3, 2023

Code
library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today we attempt to:

  1. read in the eggs dataset, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Code
eggs<-read_csv("_data/eggs_tidy.csv")
eggs
Code
unique(eggs$month)
 [1] "January"   "February"  "March"     "April"     "May"       "June"     
 [7] "July"      "August"    "September" "October"   "November"  "December" 
Code
unique(eggs$year)
 [1] 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

The dataset has information rearding the prices of eggs in various months across years 2004 to 2013. The eggs are of 2 types: large and extra large. The eggs come in two sizes: dozen and half a dozen.

Challenge: Describe the final dimensions

Document your work here.

Code
#existing rows/cases
nrow(eggs)
[1] 120
Code
#existing columns/cases
ncol(eggs)
[1] 6
Code
#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)
[1] 480
Code
# expected columns 
2 + 2
[1] 4

The inital number of columns are 6 and there are 120 rows. As we are converting 4 columns into 2, the expected number of rows should be (120)*(4) i.e. 480. The number of columns should be reduced to 4.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Code
eggs_pivoted<-eggs%>%
  pivot_longer(cols=contains("dozen"), 
               names_to = "size_quantity",
               values_to = "price"
  )
eggs_pivoted

Pivoted Example

As we can see that we have 480 rows and 4 columns as predicted before. The data seems tidier. However we see that information in the size_quantity column can be split into two: size and quantity. We perform the same below. We can see that values in this variable follow the pattern size_quantity. So we first replace this underscore with a space character. And then we use this space to split the string.

Code
eggs_pivoted$size_quantity<-sub("large_", "large ", eggs_pivoted$size_quantity)

eggs_pivoted_separated <- separate(eggs_pivoted, size_quantity, c("size", "quantity"), " ")

eggs_pivoted_separated