Code
library(tidyverse)
library(lubridate)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Paarth Tandon
January 4, 2023
Product | Year | Month | Price_Dollar |
---|---|---|---|
B/S Breast | 2004 | September | 6.4250 |
B/S Breast | 2008 | September | 6.4550 |
Bone-in Breast | 2011 | September | 3.9050 |
B/S Breast | 2012 | February | 7.0000 |
Whole Legs | 2010 | December | 2.0350 |
Bone-in Breast | 2011 | February | 3.9050 |
Thighs | 2013 | January | 2.1625 |
Whole | 2011 | August | 2.3500 |
Whole | 2008 | March | 2.2050 |
B/S Breast | 2013 | December | 7.0375 |
Product
B/S Breast Bone-in Breast Thighs Whole Whole Legs
120 120 120 120 120
This dataset describes the Price <dbl>
of poultry meat given the Product <chr>
, Year <dbl>
, and Month <dbl>
. We can see that there are 5 types of meat.
There is no work to be done as the data is already very tidy.
The Product
category can be mutated into a <dbl>
so that we can work with numbers instead of strings. The Month
category should also be mutated into <dbl>
, in case we wanted to do something like a one month ahead comparison (of course we would need to implement some sort of rollover function for the new year).
I will also create a Date
column based on the Month
and Year
, as it will make plotting over time simpler.
set.seed(42)
# product ids
poul_mut <- mutate(poul, Product_ID = recode(Product, "B/S Breast" = 1, "Bone-in Breast" = 2, "Thighs" = 3, "Whole" = 4, "Whole Legs" = 5))
# month number
poul_mut <- mutate(poul_mut, Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3, "April" = 4, "May" = 5, "June" = 6, "July" = 7, "August" = 8, "September" = 9, "October" = 10, "November" = 11, "December" = 12))
# date
poul_mut <- mutate(poul_mut, Date=make_date(Year, Month_num))
poul_mut[sample(nrow(poul_mut), 10), ]
Product | Year | Month | Price_Dollar | Product_ID | Month_num | Date |
---|---|---|---|---|---|---|
B/S Breast | 2004 | September | 6.4250 | 1 | 9 | 2004-09-01 |
B/S Breast | 2008 | September | 6.4550 | 1 | 9 | 2008-09-01 |
Bone-in Breast | 2011 | September | 3.9050 | 2 | 9 | 2011-09-01 |
B/S Breast | 2012 | February | 7.0000 | 1 | 2 | 2012-02-01 |
Whole Legs | 2010 | December | 2.0350 | 5 | 12 | 2010-12-01 |
Bone-in Breast | 2011 | February | 3.9050 | 2 | 2 | 2011-02-01 |
Thighs | 2013 | January | 2.1625 | 3 | 1 | 2013-01-01 |
Whole | 2011 | August | 2.3500 | 4 | 8 | 2011-08-01 |
Whole | 2008 | March | 2.2050 | 4 | 3 | 2008-03-01 |
B/S Breast | 2013 | December | 7.0375 | 1 | 12 | 2013-12-01 |
One comment to make is that since <date>
requires a specific day (not just a month,year combo) it extrapolates that the date is the first of the month. This should be fine for most month-to-month visualizations, but should be noted as technically incorrect.
---
title: "Challenge 4"
author: "Paarth Tandon"
description: "More data wrangling: mutate"
date: "01/04/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
df-print: kable
categories:
- challenge_4
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in data
```{r}
set.seed(42)
# read in the data using readr
poul <- read_csv("_data/poultry_tidy.csv")
# sample a few data points
poul[sample(nrow(poul), 10), ]
# products
table(select(poul, Product))
```
### Briefly describe the data
This dataset describes the `Price <dbl>` of poultry meat given the `Product <chr>`, `Year <dbl>`, and `Month <dbl>`. We can see that there are 5 types of meat.
## Tidy Data (as needed)
There is no work to be done as the data is already very tidy.
## Identify variables that need to be mutated
The `Product` category can be mutated into a `<dbl>` so that we can work with numbers instead of strings. The `Month` category should also be mutated into `<dbl>`, in case we wanted to do something like a one month ahead comparison (of course we would need to implement some sort of rollover function for the new year).
I will also create a `Date` column based on the `Month` and `Year`, as it will make plotting over time simpler.
```{r}
set.seed(42)
# product ids
poul_mut <- mutate(poul, Product_ID = recode(Product, "B/S Breast" = 1, "Bone-in Breast" = 2, "Thighs" = 3, "Whole" = 4, "Whole Legs" = 5))
# month number
poul_mut <- mutate(poul_mut, Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3, "April" = 4, "May" = 5, "June" = 6, "July" = 7, "August" = 8, "September" = 9, "October" = 10, "November" = 11, "December" = 12))
# date
poul_mut <- mutate(poul_mut, Date=make_date(Year, Month_num))
poul_mut[sample(nrow(poul_mut), 10), ]
```
One comment to make is that since `<date>` requires a specific day (not just a month,year combo) it extrapolates that the date is the first of the month. This should be fine for most month-to-month visualizations, but should be noted as technically incorrect.