Challenge 3 - Organiceggpoultry Dataset

challenge_3
organiceggpoultry
Megan Galarneau
Tidy Data: Pivoting
Author

Megan Galarneau

Published

March 7, 2023

Code
library(tidyverse)
library(dplyr)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. Read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. Identify what needs to be done to tidy the current data
  3. Anticipate the shape of pivoted data
  4. Pivot the data into tidy format using pivot_longer

Read in data

Upon reading in the data set, I made some initial changes to clean up the data frame and make it easier to use later on. Specifically, I removed/renamed the column headers and removed the empty column.

Code
 library(readxl) 
raw_organiceggpoultry <- read_excel("_data/organiceggpoultry.xls", skip = 5, col_names = c("Date", "Eggxl_12", "Eggxl_6", "Egglg_12", "Egglg_6", "Remove", "Chkn_Whl", "Chkn_BSBrst", "Chkn_BnInBrst", "Chkn_WhlLg", "Chkn_Thigh")) %>%
select(-c(Remove))

Briefly describe the data

This data set outlines the monthly prices (cents per pound) paid for USDA certified organic eggs & poultry from 2004-2013. There are four types of eggs: extra large dozen, extra large half dozen, large dozen, and large half dozen. There are also five types of chickens: whole, boneless/skinless breast, bone-in breast, whole legs, and thighs. This data set is sourced from the U.S. Department of Agriculture, Agricultural Marketing Service (AMS) Market News, Organic Poultry and Eggs (Weekly reports).

In an ideal state, I am planning to pivot this data set so it neatly displays in one row the year, month, item, item type, and cost. Right now, even though I cleaned up the data, the reader cannot easily understand what story the data is telling.

Code
print(summarytools::dfSummary(raw_organiceggpoultry,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

raw_organiceggpoultry

Dimensions: 120 x 10
Duplicates: 8
Variable Stats / Values Freqs (% of Valid) Graph Missing
Date [character]
1. April
2. August
3. December
4. July
5. June
6. March
7. May
8. November
9. October
10. September
[ 12 others ]
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
10 ( 8.3% )
20 ( 16.7% )
0 (0.0%)
Eggxl_12 [numeric]
Mean (sd) : 266.8 (22.8)
min ≤ med ≤ max:
230 ≤ 285.5 ≤ 290
IQR (CV) : 44 (0.1)
11 distinct values 0 (0.0%)
Eggxl_6 [numeric]
Mean (sd) : 164.2 (24.7)
min ≤ med ≤ max:
132 ≤ 185.5 ≤ 188.1
IQR (CV) : 49.7 (0.2)
132.00   : 1 ( 0.8% )
134.50   : 1 ( 0.8% )
135.50   : 28 ( 23.3% )
135.88 ! : 1 ( 0.8% )
137.00   : 6 ( 5.0% )
138.12 ! : 1 ( 0.8% )
139.00   : 15 ( 12.5% )
185.50   : 53 ( 44.2% )
188.13   : 14 ( 11.7% )
! rounded
0 (0.0%)
Egglg_12 [numeric]
Mean (sd) : 254.2 (18.5)
min ≤ med ≤ max:
225 ≤ 267.5 ≤ 277.5
IQR (CV) : 34.5 (0.1)
12 distinct values 0 (0.0%)
Egglg_6 [numeric]
Mean (sd) : 155.2 (22.6)
min ≤ med ≤ max:
126 ≤ 174.5 ≤ 178
IQR (CV) : 45.1 (0.1)
126.00   : 1 ( 0.8% )
128.50   : 29 ( 24.2% )
129.75   : 1 ( 0.8% )
131.00   : 3 ( 2.5% )
131.12 ! : 1 ( 0.8% )
132.00   : 15 ( 12.5% )
133.50   : 3 ( 2.5% )
173.25   : 6 ( 5.0% )
174.50   : 47 ( 39.2% )
178.00   : 14 ( 11.7% )
! rounded
0 (0.0%)
Chkn_Whl [numeric]
Mean (sd) : 230.5 (12.5)
min ≤ med ≤ max:
197.5 ≤ 235 ≤ 248
IQR (CV) : 18 (0.1)
12 distinct values 0 (0.0%)
Chkn_BSBrst [numeric]
Mean (sd) : 654.9 (23.3)
min ≤ med ≤ max:
637.5 ≤ 645.5 ≤ 703.8
IQR (CV) : 0.4 (0)
637.50 : 17 ( 14.2% )
641.00 : 1 ( 0.8% )
642.30 : 1 ( 0.8% )
642.50 : 10 ( 8.3% )
644.00 : 1 ( 0.8% )
645.50 : 67 ( 55.8% )
700.00 : 9 ( 7.5% )
703.75 : 14 ( 11.7% )
0 (0.0%)
Chkn_BnInBrst [character]
1. 390.5
2. too few
114 ( 95.0% )
6 ( 5.0% )
0 (0.0%)
Chkn_WhlLg [numeric]
Mean (sd) : 203.1 (2)
min ≤ med ≤ max:
193.5 ≤ 203.5 ≤ 203.5
IQR (CV) : 0 (0)
193.50   : 5 ( 4.2% )
201.88 ! : 1 ( 0.8% )
203.50   : 114 ( 95.0% )
! rounded
0 (0.0%)
Chkn_Thigh [character]
1. 199.5
2. 200.375
3. 203
4. 213
5. 215
6. 216.25
7. 219.19999999999999
8. 222
9. too few
6 ( 5.0% )
1 ( 0.8% )
4 ( 3.3% )
1 ( 0.8% )
23 ( 19.2% )
17 ( 14.2% )
1 ( 0.8% )
66 ( 55.0% )
1 ( 0.8% )
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-03-07

Anticipate the End Result

To reach the ideal state mentioned above, I will first need to break out the year and month from the date column. Additionally, there are non-numeric data points in the chicken columns so I will recode those as well. Each of these changes will ensure that I can properly pivot the data from wide to long in the last section.

Code
raw_organiceggpoultry <- raw_organiceggpoultry %>% separate(Date, into = c("Month", "Year"), sep = " ") %>%
mutate(Year = as.integer(Year))%>%
  fill(Year)
Code
raw_organiceggpoultry<-raw_organiceggpoultry%>%
  mutate(Chkn_BnInBrst = recode(Chkn_BnInBrst, `too few` = "0"),
Chkn_Thigh = recode(Chkn_Thigh, `too few` = "0"))
Code
raw_organiceggpoultry$Chkn_BnInBrst <- as.numeric(raw_organiceggpoultry$Chkn_BnInBrst)
Code
raw_organiceggpoultry$Chkn_Thigh <-
as.numeric(raw_organiceggpoultry$Chkn_Thigh)
raw_organiceggpoultry
Code
str(raw_organiceggpoultry)
tibble [120 × 11] (S3: tbl_df/tbl/data.frame)
 $ Month        : chr [1:120] "Jan" "February" "March" "April" ...
 $ Year         : int [1:120] 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
 $ Eggxl_12     : num [1:120] 230 230 230 234 236 ...
 $ Eggxl_6      : num [1:120] 132 134 137 137 137 ...
 $ Egglg_12     : num [1:120] 230 226 225 225 225 ...
 $ Egglg_6      : num [1:120] 126 128 131 131 131 ...
 $ Chkn_Whl     : num [1:120] 198 198 209 212 214 ...
 $ Chkn_BSBrst  : num [1:120] 646 642 642 642 642 ...
 $ Chkn_BnInBrst: num [1:120] 0 0 0 0 0 ...
 $ Chkn_WhlLg   : num [1:120] 194 194 194 194 194 ...
 $ Chkn_Thigh   : num [1:120] 0 203 203 203 203 ...

Pivot the Data

I pivoted the data successfully from wide to long. I can check this by comparing it to the data table above. It now clearly outlines the breakdown of organic egg and poultry prices (cent per pound) by month from 2004-2013. A new “case” in this instance is the monthly price of a type of egg or poultry.

Code
raw_organiceggpoultry <-pivot_longer(raw_organiceggpoultry, col = c("Eggxl_12", "Eggxl_6", "Egglg_12", "Egglg_6", "Chkn_Whl", "Chkn_BSBrst", "Chkn_WhlLg", "Chkn_BnInBrst", "Chkn_Thigh"),
                 names_to="Organic Eggs & Poultry",
                 values_to = "Cost, Cent/Lb")
raw_organiceggpoultry