Code
library(tidyverse)
library(lubridate)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Shantanu Patil
March 26, 2023
Today’s challenge is to:
# A tibble: 10 × 4
Product Year Month Price_Dollar
<chr> <dbl> <chr> <dbl>
1 B/S Breast 2010 April 6.46
2 Thighs 2008 October 2.22
3 B/S Breast 2010 June 6.46
4 Whole Legs 2005 April 2.04
5 B/S Breast 2013 February 7.04
6 Thighs 2009 December 2.22
7 Thighs 2010 September 2.15
8 B/S Breast 2009 July 6.46
9 B/S Breast 2004 October 6.42
10 Thighs 2004 August 2.00
Product
B/S Breast Bone-in Breast Thighs Whole Whole Legs
120 120 120 120 120
Given the Product, Year, and Month, this dataset describes the price of poultry meat. We can see that there are five different types of meat.
The data already tidy.
The ‘Product’ and ‘Month’ categories must be converted to
# Assign product IDs
poul_mut <- poul %>%
mutate(Product_ID = recode(Product, "B/S Breast" = 1,
"Bone-in Breast" = 2,
"Thighs" = 3,
"Whole" = 4,
"Whole Legs" = 5))
# Convert month names to month numbers
poul_mut <- poul_mut %>%
mutate(Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3,
"April" = 4, "May" = 5, "June" = 6, "July" = 7,
"August" = 8, "September" = 9, "October" = 10,
"November" = 11, "December" = 12))
# Create Date column based on Year and Month_num
poul_mut <- poul_mut %>%
mutate(Date = make_date(Year, Month_num))
# Display a sample of the modified data
poul_mut[sample(nrow(poul_mut), 10), ]
# A tibble: 10 × 7
Product Year Month Price_Dollar Product_ID Month_num Date
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <date>
1 Whole Legs 2004 July 2.04 5 7 2004-07-01
2 B/S Breast 2006 April 6.46 1 4 2006-04-01
3 Thighs 2005 April 2.22 3 4 2005-04-01
4 Whole Legs 2010 March 2.04 5 3 2010-03-01
5 Whole Legs 2007 July 2.04 5 7 2007-07-01
6 Whole Legs 2008 November 2.04 5 11 2008-11-01
7 Whole 2008 November 2.48 4 11 2008-11-01
8 B/S Breast 2008 February 6.46 1 2 2008-02-01
9 B/S Breast 2013 December 7.04 1 12 2013-12-01
10 Whole 2011 February 2.35 4 2 2011-02-01
---
title: "Challenge 4"
author: "Shantanu Patil"
description: "More data wrangling: pivoting"
date: "03/26/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
```{r}
## Read in data
poul <- read_csv("_data/poultry_tidy.csv")
## Sample a few data points
poul_sample <- poul[sample(nrow(poul), 10), ]
print(poul_sample)
## Calculate product counts
product_counts <- table(select(poul, Product))
print(product_counts)
```
### Briefly describe the data
Given the Product, Year, and Month, this dataset describes the price of poultry meat. We can see that there are five different types of meat.
## Tidy Data (as needed)
The data already tidy.
## Identify variables that need to be mutated
The 'Product' and 'Month' categories must be converted to <dbl> (double) data types so that we can perform numerical operations on them more easily. For example, converting the Month category to a numeric data type would allow us to perform month-to-month comparisons while accounting for the new year's rollover. We should also add a 'Date' column based on the 'Month' and'Year' columns. This will simplify data visualization and analysis.
```{r}
# Assign product IDs
poul_mut <- poul %>%
mutate(Product_ID = recode(Product, "B/S Breast" = 1,
"Bone-in Breast" = 2,
"Thighs" = 3,
"Whole" = 4,
"Whole Legs" = 5))
# Convert month names to month numbers
poul_mut <- poul_mut %>%
mutate(Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3,
"April" = 4, "May" = 5, "June" = 6, "July" = 7,
"August" = 8, "September" = 9, "October" = 10,
"November" = 11, "December" = 12))
# Create Date column based on Year and Month_num
poul_mut <- poul_mut %>%
mutate(Date = make_date(Year, Month_num))
# Display a sample of the modified data
poul_mut[sample(nrow(poul_mut), 10), ]
```