challenge_4
More data wrangling: mutate
Author

Paarth Tandon

Published

January 4, 2023

Code
library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code
set.seed(42)
# read in the data using readr
poul <- read_csv("_data/poultry_tidy.csv")
# sample a few data points
poul[sample(nrow(poul), 10), ]
Product Year Month Price_Dollar
B/S Breast 2004 September 6.4250
B/S Breast 2008 September 6.4550
Bone-in Breast 2011 September 3.9050
B/S Breast 2012 February 7.0000
Whole Legs 2010 December 2.0350
Bone-in Breast 2011 February 3.9050
Thighs 2013 January 2.1625
Whole 2011 August 2.3500
Whole 2008 March 2.2050
B/S Breast 2013 December 7.0375
Code
# products
table(select(poul, Product))
Product
    B/S Breast Bone-in Breast         Thighs          Whole     Whole Legs 
           120            120            120            120            120 

Briefly describe the data

This dataset describes the Price <dbl> of poultry meat given the Product <chr>, Year <dbl>, and Month <dbl>. We can see that there are 5 types of meat.

Tidy Data (as needed)

There is no work to be done as the data is already very tidy.

Identify variables that need to be mutated

The Product category can be mutated into a <dbl> so that we can work with numbers instead of strings. The Month category should also be mutated into <dbl>, in case we wanted to do something like a one month ahead comparison (of course we would need to implement some sort of rollover function for the new year).

I will also create a Date column based on the Month and Year, as it will make plotting over time simpler.

Code
set.seed(42)

# product ids
poul_mut <- mutate(poul, Product_ID = recode(Product, "B/S Breast" = 1, "Bone-in Breast" = 2, "Thighs" = 3, "Whole" = 4, "Whole Legs" = 5))

# month number
poul_mut <- mutate(poul_mut, Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3, "April" = 4, "May" = 5, "June" = 6, "July" = 7, "August" = 8, "September" = 9, "October" = 10, "November" = 11, "December" = 12))

# date
poul_mut <- mutate(poul_mut, Date=make_date(Year, Month_num))

poul_mut[sample(nrow(poul_mut), 10), ]
Product Year Month Price_Dollar Product_ID Month_num Date
B/S Breast 2004 September 6.4250 1 9 2004-09-01
B/S Breast 2008 September 6.4550 1 9 2008-09-01
Bone-in Breast 2011 September 3.9050 2 9 2011-09-01
B/S Breast 2012 February 7.0000 1 2 2012-02-01
Whole Legs 2010 December 2.0350 5 12 2010-12-01
Bone-in Breast 2011 February 3.9050 2 2 2011-02-01
Thighs 2013 January 2.1625 3 1 2013-01-01
Whole 2011 August 2.3500 4 8 2011-08-01
Whole 2008 March 2.2050 4 3 2008-03-01
B/S Breast 2013 December 7.0375 1 12 2013-12-01

One comment to make is that since <date> requires a specific day (not just a month,year combo) it extrapolates that the date is the first of the month. This should be fine for most month-to-month visualizations, but should be noted as technically incorrect.