Challenge 4 Egg Data

challenge_4
Sue-Ellen Duffy
eggs
Data wrangling: Mutate
Author

Sue-Ellen Duffy

Published

March 22, 2023

Code
library(tidyverse)
library(readr)
library(readxl)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Egg Data, Round 2

Here we have the monthly price paid by first receivers for USDA Certified Organic Eggs by size and carton types for January 2004 to December 2013. The unit “Price” is “cents per pound”. So, for example in January 2004, the price paid by first receivers for USDA Certified Dozen of Extra Large Eggs was 230 cents/pound. As defined in the original data, First Receivers “those entities that purchases the processed product from the poultry or egg company, such as a retailer, distributor, or manufacturer”. The term wholesale level would be appropriate in many cases.

Read in the Data

Code
eggs_og <- read_excel("_data/organiceggpoultry.xls",
                      sheet="Data",
                      range =cell_limits(c(6,2),c(NA,6)),
                      col_names = c("date", "xlarge_dzn", "xlarge_halfdzn", "large_dzn", "large_halfdzn")
)
eggs_og
# A tibble: 120 × 5
   date      xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
   <chr>          <dbl>          <dbl>     <dbl>         <dbl>
 1 Jan 2004        230            132       230           126 
 2 February        230            134.      226.          128.
 3 March           230            137       225           131 
 4 April           234.           137       225           131 
 5 May             236            137       225           131 
 6 June            241            137       231.          134.
 7 July            241            137       234.          134.
 8 August          241            137       234.          134.
 9 September       241            136.      234.          130.
10 October         241            136.      234.          128.
# … with 110 more rows

Date Format 1

Starting off a little messy. Already I see this data is -wide- and the date needs formatting. Let’s see what other nuances might be lingering in the date column (count).

Code
table(select(eggs_og, date))
date
      April      August    December    February February /1    Jan 2004 
         10          10          10           8           2           1 
   Jan 2005    Jan 2006    Jan 2007    Jan 2008    Jan 2009    Jan 2010 
          1           1           1           1           1           1 
   Jan 2011    Jan 2012    Jan 2013        July        June       March 
          1           1           1          10          10          10 
        May    November     October   September 
         10          10          10          10 

Date Format 2

In the date column, January has a year indicator, 10 of the months appear 10 times, February appears 8 times and February/1 (a leap year) appears twice. We have to delete the /1 in February (mutate) and extend the year indicator from January to the rest of the months (separate and fill).

Code
eggs <- eggs_og %>%
  mutate(date = str_remove(date, " /1")) %>%
  separate(date, into=c("month", "year"), sep=" ") %>%
  fill(year)
eggs
# A tibble: 120 × 6
   month     year  xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
   <chr>     <chr>      <dbl>          <dbl>     <dbl>         <dbl>
 1 Jan       2004        230            132       230           126 
 2 February  2004        230            134.      226.          128.
 3 March     2004        230            137       225           131 
 4 April     2004        234.           137       225           131 
 5 May       2004        236            137       225           131 
 6 June      2004        241            137       231.          134.
 7 July      2004        241            137       234.          134.
 8 August    2004        241            137       234.          134.
 9 September 2004        241            136.      234.          130.
10 October   2004        241            136.      234.          128.
# … with 110 more rows

Date Format 3

We need to mutate the Month and Year into one combined date. In order to do that we would mutate and make_date time. However when I try that I get an error. I’m not entirely sure why. I tried renaming Jan to January to see if that was the issue. It still won’t let me so I will just use the mutate and str_c functions to create the date column.

Code
eggs<-eggs %>%
  mutate(month=recode (month, 'Jan'='January')) %>%
  mutate(date = str_c(month, year, sep=" "),
         date = my(date))
select(eggs, month, year, date)
# A tibble: 120 × 3
   month     year  date      
   <chr>     <chr> <date>    
 1 January   2004  2004-01-01
 2 February  2004  2004-02-01
 3 March     2004  2004-03-01
 4 April     2004  2004-04-01
 5 May       2004  2004-05-01
 6 June      2004  2004-06-01
 7 July      2004  2004-07-01
 8 August    2004  2004-08-01
 9 September 2004  2004-09-01
10 October   2004  2004-10-01
# … with 110 more rows
Code
eggs<-eggs%>%
  mutate(date = make_datetime(month, year))
Error in `mutate()`:
ℹ In argument: `date = make_datetime(month, year)`.
Caused by error:
! Invalid input type, expected 'integer' actual 'character'