Challenge 3: Eggs 2004-2013

challenge_3
eggs
Sue-Ellen Duffy
Pivoting Egg Data
Author

Sue-Ellen Duffy

Published

March 14, 2023

Code
library(tidyverse)
library(summarytools)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Egg Data!

Here we have the monthly price paid by first receivers for USDA Certified Organic Eggs by size and carton types for January 2004 to December 2013. The unit “Price” is “cents per pound”. So, for example in January 2004, the price paid by first receivers for USDA Certified Dozen of Extra Large Eggs was 230 cents/pound. As defined in the original data, First Receivers “those entities that purchases the processed product from the poultry or egg company, such as a retailer, distributor, or manufacturer”. The term wholesale level would be appropriate in many cases.

Reading in the egg data

Code
eggs_og <- read_excel("_data/organiceggpoultry.xls",
                      sheet="Data",
                      range =cell_limits(c(6,2),c(NA,6)),
                      col_names = c("date", "xlarge_dzn", "xlarge_halfdzn", "large_dzn", "large_halfdzn")
)
eggs_og
# A tibble: 120 × 5
   date      xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
   <chr>          <dbl>          <dbl>     <dbl>         <dbl>
 1 Jan 2004        230            132       230           126 
 2 February        230            134.      226.          128.
 3 March           230            137       225           131 
 4 April           234.           137       225           131 
 5 May             236            137       225           131 
 6 June            241            137       231.          134.
 7 July            241            137       234.          134.
 8 August          241            137       234.          134.
 9 September       241            136.      234.          130.
10 October         241            136.      234.          128.
# … with 110 more rows

Date Format 1

Starting off a little messy. Already I see this data is -wide- and the date needs formatting. Let’s see what other nuances might be lingering in the date column (count).

Code
table(select(eggs_og, date))
date
      April      August    December    February February /1    Jan 2004 
         10          10          10           8           2           1 
   Jan 2005    Jan 2006    Jan 2007    Jan 2008    Jan 2009    Jan 2010 
          1           1           1           1           1           1 
   Jan 2011    Jan 2012    Jan 2013        July        June       March 
          1           1           1          10          10          10 
        May    November     October   September 
         10          10          10          10 

Date Format 2

In the date column, January has a year indicator, 10 of the months appear 10 times, February appears 8 times and February/1 (a leap year) appears twice. We have to delete the /1 in February (mutate) and extend the year indicator from January to the rest of the months (separate and fill).

Code
eggs <- eggs_og %>%
  mutate(date = str_remove(date, " /1")) %>%
  separate(date,into=c("month", "year"), sep=" ") %>%
  fill(year)
eggs
# A tibble: 120 × 6
   month     year  xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
   <chr>     <chr>      <dbl>          <dbl>     <dbl>         <dbl>
 1 Jan       2004        230            132       230           126 
 2 February  2004        230            134.      226.          128.
 3 March     2004        230            137       225           131 
 4 April     2004        234.           137       225           131 
 5 May       2004        236            137       225           131 
 6 June      2004        241            137       231.          134.
 7 July      2004        241            137       234.          134.
 8 August    2004        241            137       234.          134.
 9 September 2004        241            136.      234.          130.
10 October   2004        241            136.      234.          128.
# … with 110 more rows

Pivot

We need to adjust this data so that it is long data (pivot). As of right now we can look at the data nicely, but can’t do much analysis across sizes because they are in different columns. I will shift this data to month, year, “carton_type” which will combine the 4 types of cartons into one column as their names and place their values into another column labeled “price”.

Code
eggs_long <- eggs %>%
  pivot_longer(cols=3:6,
    names_to = c("carton_type"),
      values_to = "Price"
  )
eggs_long
# A tibble: 480 × 4
   month    year  carton_type    Price
   <chr>    <chr> <chr>          <dbl>
 1 Jan      2004  xlarge_dzn      230 
 2 Jan      2004  xlarge_halfdzn  132 
 3 Jan      2004  large_dzn       230 
 4 Jan      2004  large_halfdzn   126 
 5 February 2004  xlarge_dzn      230 
 6 February 2004  xlarge_halfdzn  134.
 7 February 2004  large_dzn       226.
 8 February 2004  large_halfdzn   128.
 9 March    2004  xlarge_dzn      230 
10 March    2004  xlarge_halfdzn  137 
# … with 470 more rows

Analyze

Alright, now our data is pretty succinct. We can start doing some analysis!

Code
eggs_long %>%
  arrange(desc(`Price`))
# A tibble: 480 × 4
   month    year  carton_type Price
   <chr>    <chr> <chr>       <dbl>
 1 November 2012  xlarge_dzn    290
 2 December 2012  xlarge_dzn    290
 3 Jan      2013  xlarge_dzn    290
 4 February 2013  xlarge_dzn    290
 5 March    2013  xlarge_dzn    290
 6 April    2013  xlarge_dzn    290
 7 May      2013  xlarge_dzn    290
 8 June     2013  xlarge_dzn    290
 9 July     2013  xlarge_dzn    290
10 August   2013  xlarge_dzn    290
# … with 470 more rows

Price Analysis by Carton Type

Code
eggs_long %>%
  group_by(carton_type) %>%
  summarise( Min = min(Price),
             Max = max(Price),
             Median = median(Price),
             Mean = mean(Price)) %>%
  arrange(desc(Mean))
# A tibble: 4 × 5
  carton_type      Min   Max Median  Mean
  <chr>          <dbl> <dbl>  <dbl> <dbl>
1 xlarge_dzn       230  290    286.  267.
2 large_dzn        225  278.   268.  254.
3 xlarge_halfdzn   132  188.   186.  164.
4 large_halfdzn    126  178    174.  155.

No surprises here, xlarge_dzn is on average the most expensive of the 4 carton types and large_halfdzn is on average the least expensive of the 4 carton types.

Price analysis by Month

Code
eggs_long %>%
  group_by(month) %>%
  summarise( Min = min(Price),
             Max = max(Price),
             Median = median(Price),
             Mean =mean(Price)) %>%
  arrange(desc(Mean)) %>%
  print(n=12)
# A tibble: 12 × 5
   month       Min   Max Median  Mean
   <chr>     <dbl> <dbl>  <dbl> <dbl>
 1 July       128.   290   211.  212.
 2 June       128.   290   210.  212.
 3 December   128.   290   211.  212.
 4 November   128.   290   211.  212.
 5 August     128.   290   211.  212.
 6 September  128.   290   211.  212.
 7 October    128.   290   211.  212.
 8 May        128.   290   207.  208.
 9 April      128.   290   207.  208.
10 March      128.   290   207.  207.
11 February   128.   290   207.  207.
12 Jan        126    290   209.  207.

Here we see the most expensive months are at the beginning of the winter season and at the beginning of the summer season. I wonder if this has anything to do with egg laying production or if this increase is seen within the broader scope of food price statistics.

Price Analysis by Year

Code
eggs_long %>%
  group_by(year) %>%
  summarise( Min = min(Price),
             Max = max(Price),
             Median = median(Price),
             Mean =mean(Price))
# A tibble: 10 × 5
   year    Min   Max Median  Mean
   <chr> <dbl> <dbl>  <dbl> <dbl>
 1 2004   126   241    181   183.
 2 2005   128.  241    184.  185.
 3 2006   128.  242.   184.  185.
 4 2007   128.  245    186.  188.
 5 2008   132   286.   211.  213.
 6 2009   174.  286.   228.  230.
 7 2010   174.  286.   226.  228.
 8 2011   174.  286.   226.  229.
 9 2012   173.  290    228.  229.
10 2013   178   290    228.  231.

When looking at the data by Year we are basically given the dimensions for: large half dozen price of the cheapest month in a given year (min) extra large dozen price of the most expensive month in a given year (max) average price across 4 carton types and 12 months in a given year(mean).This does allow us to look at the price fluctuations over the years.Two drastic year shifts are from 2007 to 2008 and 2008 to 2009.