challenge_4
More data wrangling: pivoting
Author

Shantanu Patil

Published

March 26, 2023

Code
library(tidyverse)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Code
## Read in data
poul <- read_csv("_data/poultry_tidy.csv")

## Sample a few data points
poul_sample <- poul[sample(nrow(poul), 10), ]
print(poul_sample)
# A tibble: 10 × 4
   Product     Year Month     Price_Dollar
   <chr>      <dbl> <chr>            <dbl>
 1 B/S Breast  2010 April             6.46
 2 Thighs      2008 October           2.22
 3 B/S Breast  2010 June              6.46
 4 Whole Legs  2005 April             2.04
 5 B/S Breast  2013 February          7.04
 6 Thighs      2009 December          2.22
 7 Thighs      2010 September         2.15
 8 B/S Breast  2009 July              6.46
 9 B/S Breast  2004 October           6.42
10 Thighs      2004 August            2.00
Code
## Calculate product counts
product_counts <- table(select(poul, Product))
print(product_counts)
Product
    B/S Breast Bone-in Breast         Thighs          Whole     Whole Legs 
           120            120            120            120            120 

Briefly describe the data

Given the Product, Year, and Month, this dataset describes the price of poultry meat. We can see that there are five different types of meat.

Tidy Data (as needed)

The data already tidy.

Identify variables that need to be mutated

The ‘Product’ and ‘Month’ categories must be converted to (double) data types so that we can perform numerical operations on them more easily. For example, converting the Month category to a numeric data type would allow us to perform month-to-month comparisons while accounting for the new year’s rollover. We should also add a ‘Date’ column based on the ‘Month’ and’Year’ columns. This will simplify data visualization and analysis.

Code
# Assign product IDs
poul_mut <- poul %>%
  mutate(Product_ID = recode(Product, "B/S Breast" = 1, 
                             "Bone-in Breast" = 2, 
                             "Thighs" = 3, 
                             "Whole" = 4, 
                             "Whole Legs" = 5))

# Convert month names to month numbers
poul_mut <- poul_mut %>%
  mutate(Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3, 
                            "April" = 4, "May" = 5, "June" = 6, "July" = 7, 
                            "August" = 8, "September" = 9, "October" = 10, 
                            "November" = 11, "December" = 12))

# Create Date column based on Year and Month_num
poul_mut <- poul_mut %>%
  mutate(Date = make_date(Year, Month_num))

# Display a sample of the modified data
poul_mut[sample(nrow(poul_mut), 10), ]
# A tibble: 10 × 7
   Product     Year Month    Price_Dollar Product_ID Month_num Date      
   <chr>      <dbl> <chr>           <dbl>      <dbl>     <dbl> <date>    
 1 Whole Legs  2004 July             2.04          5         7 2004-07-01
 2 B/S Breast  2006 April            6.46          1         4 2006-04-01
 3 Thighs      2005 April            2.22          3         4 2005-04-01
 4 Whole Legs  2010 March            2.04          5         3 2010-03-01
 5 Whole Legs  2007 July             2.04          5         7 2007-07-01
 6 Whole Legs  2008 November         2.04          5        11 2008-11-01
 7 Whole       2008 November         2.48          4        11 2008-11-01
 8 B/S Breast  2008 February         6.46          1         2 2008-02-01
 9 B/S Breast  2013 December         7.04          1        12 2013-12-01
10 Whole       2011 February         2.35          4         2 2011-02-01