Code
library(tidyverse)
library(summarytools)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Sue-Ellen Duffy
March 14, 2023
Here we have the monthly price paid by first receivers for USDA Certified Organic Eggs by size and carton types for January 2004 to December 2013. The unit “Price” is “cents per pound”. So, for example in January 2004, the price paid by first receivers for USDA Certified Dozen of Extra Large Eggs was 230 cents/pound. As defined in the original data, First Receivers “those entities that purchases the processed product from the poultry or egg company, such as a retailer, distributor, or manufacturer”. The term wholesale level would be appropriate in many cases.
# A tibble: 120 × 5
date xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
<chr> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126
2 February 230 134. 226. 128.
3 March 230 137 225 131
4 April 234. 137 225 131
5 May 236 137 225 131
6 June 241 137 231. 134.
7 July 241 137 234. 134.
8 August 241 137 234. 134.
9 September 241 136. 234. 130.
10 October 241 136. 234. 128.
# … with 110 more rows
Starting off a little messy. Already I see this data is -wide- and the date needs formatting. Let’s see what other nuances might be lingering in the date column (count).
date
April August December February February /1 Jan 2004
10 10 10 8 2 1
Jan 2005 Jan 2006 Jan 2007 Jan 2008 Jan 2009 Jan 2010
1 1 1 1 1 1
Jan 2011 Jan 2012 Jan 2013 July June March
1 1 1 10 10 10
May November October September
10 10 10 10
In the date column, January has a year indicator, 10 of the months appear 10 times, February appears 8 times and February/1 (a leap year) appears twice. We have to delete the /1 in February (mutate) and extend the year indicator from January to the rest of the months (separate and fill).
# A tibble: 120 × 6
month year xlarge_dzn xlarge_halfdzn large_dzn large_halfdzn
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126
2 February 2004 230 134. 226. 128.
3 March 2004 230 137 225 131
4 April 2004 234. 137 225 131
5 May 2004 236 137 225 131
6 June 2004 241 137 231. 134.
7 July 2004 241 137 234. 134.
8 August 2004 241 137 234. 134.
9 September 2004 241 136. 234. 130.
10 October 2004 241 136. 234. 128.
# … with 110 more rows
We need to adjust this data so that it is long data (pivot). As of right now we can look at the data nicely, but can’t do much analysis across sizes because they are in different columns. I will shift this data to month, year, “carton_type” which will combine the 4 types of cartons into one column as their names and place their values into another column labeled “price”.
# A tibble: 480 × 4
month year carton_type Price
<chr> <chr> <chr> <dbl>
1 Jan 2004 xlarge_dzn 230
2 Jan 2004 xlarge_halfdzn 132
3 Jan 2004 large_dzn 230
4 Jan 2004 large_halfdzn 126
5 February 2004 xlarge_dzn 230
6 February 2004 xlarge_halfdzn 134.
7 February 2004 large_dzn 226.
8 February 2004 large_halfdzn 128.
9 March 2004 xlarge_dzn 230
10 March 2004 xlarge_halfdzn 137
# … with 470 more rows
Alright, now our data is pretty succinct. We can start doing some analysis!
# A tibble: 480 × 4
month year carton_type Price
<chr> <chr> <chr> <dbl>
1 November 2012 xlarge_dzn 290
2 December 2012 xlarge_dzn 290
3 Jan 2013 xlarge_dzn 290
4 February 2013 xlarge_dzn 290
5 March 2013 xlarge_dzn 290
6 April 2013 xlarge_dzn 290
7 May 2013 xlarge_dzn 290
8 June 2013 xlarge_dzn 290
9 July 2013 xlarge_dzn 290
10 August 2013 xlarge_dzn 290
# … with 470 more rows
# A tibble: 4 × 5
carton_type Min Max Median Mean
<chr> <dbl> <dbl> <dbl> <dbl>
1 xlarge_dzn 230 290 286. 267.
2 large_dzn 225 278. 268. 254.
3 xlarge_halfdzn 132 188. 186. 164.
4 large_halfdzn 126 178 174. 155.
No surprises here, xlarge_dzn is on average the most expensive of the 4 carton types and large_halfdzn is on average the least expensive of the 4 carton types.
# A tibble: 12 × 5
month Min Max Median Mean
<chr> <dbl> <dbl> <dbl> <dbl>
1 July 128. 290 211. 212.
2 June 128. 290 210. 212.
3 December 128. 290 211. 212.
4 November 128. 290 211. 212.
5 August 128. 290 211. 212.
6 September 128. 290 211. 212.
7 October 128. 290 211. 212.
8 May 128. 290 207. 208.
9 April 128. 290 207. 208.
10 March 128. 290 207. 207.
11 February 128. 290 207. 207.
12 Jan 126 290 209. 207.
Here we see the most expensive months are at the beginning of the winter season and at the beginning of the summer season. I wonder if this has anything to do with egg laying production or if this increase is seen within the broader scope of food price statistics.
# A tibble: 10 × 5
year Min Max Median Mean
<chr> <dbl> <dbl> <dbl> <dbl>
1 2004 126 241 181 183.
2 2005 128. 241 184. 185.
3 2006 128. 242. 184. 185.
4 2007 128. 245 186. 188.
5 2008 132 286. 211. 213.
6 2009 174. 286. 228. 230.
7 2010 174. 286. 226. 228.
8 2011 174. 286. 226. 229.
9 2012 173. 290 228. 229.
10 2013 178 290 228. 231.
When looking at the data by Year we are basically given the dimensions for: large half dozen price of the cheapest month in a given year (min) extra large dozen price of the most expensive month in a given year (max) average price across 4 carton types and 12 months in a given year(mean).This does allow us to look at the price fluctuations over the years.Two drastic year shifts are from 2007 to 2008 and 2008 to 2009.
---
title: "Challenge 3: Eggs 2004-2013"
author: "Sue-Ellen Duffy"
description: "Pivoting Egg Data"
date: "03/14/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs
- Sue-Ellen Duffy
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(summarytools)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
### Egg Data!
Here we have the monthly price paid by first receivers for USDA Certified Organic Eggs by size and carton types for January 2004 to December 2013. The unit "Price" is "cents per pound". So, for example in January 2004, the price paid by first receivers for USDA Certified Dozen of Extra Large Eggs was 230 cents/pound. As defined in the original data, First Receivers "those entities that purchases the processed product from the poultry or egg company, such as a retailer, distributor, or manufacturer". The term wholesale level would be appropriate in many cases.
## Reading in the egg data
```{r}
eggs_og <- read_excel("_data/organiceggpoultry.xls",
sheet="Data",
range =cell_limits(c(6,2),c(NA,6)),
col_names = c("date", "xlarge_dzn", "xlarge_halfdzn", "large_dzn", "large_halfdzn")
)
eggs_og
```
::: callout-Tip
## Date Format 1
Starting off a little messy. Already I see this data is -wide- and the date needs formatting. Let's see what other nuances might be lingering in the date column (count).
:::
```{r}
table(select(eggs_og, date))
```
::: callout-Tip
## Date Format 2
In the date column, January has a year indicator, 10 of the months appear 10 times, February appears 8 times and February/1 (a leap year) appears twice. We have to delete the /1 in February (mutate) and extend the year indicator from January to the rest of the months (separate and fill).
:::
```{r}
eggs <- eggs_og %>%
mutate(date = str_remove(date, " /1")) %>%
separate(date,into=c("month", "year"), sep=" ") %>%
fill(year)
eggs
```
::: callout-Tip
## Pivot
We need to adjust this data so that it is long data (pivot). As of right now we can look at the data nicely, but can't do much analysis across sizes because they are in different columns. I will shift this data to month, year, "carton_type" which will combine the 4 types of cartons into one column as their names and place their values into another column labeled "price".
:::
```{r}
eggs_long <- eggs %>%
pivot_longer(cols=3:6,
names_to = c("carton_type"),
values_to = "Price"
)
eggs_long
```
### Analyze
Alright, now our data is pretty succinct. We can start doing some analysis!
```{r}
eggs_long %>%
arrange(desc(`Price`))
```
## Price Analysis by Carton Type
```{r}
eggs_long %>%
group_by(carton_type) %>%
summarise( Min = min(Price),
Max = max(Price),
Median = median(Price),
Mean = mean(Price)) %>%
arrange(desc(Mean))
```
No surprises here, xlarge_dzn is on average the most expensive of the 4 carton types and large_halfdzn is on average the least expensive of the 4 carton types.
## Price analysis by Month
```{r}
eggs_long %>%
group_by(month) %>%
summarise( Min = min(Price),
Max = max(Price),
Median = median(Price),
Mean =mean(Price)) %>%
arrange(desc(Mean)) %>%
print(n=12)
```
Here we see the most expensive months are at the beginning of the winter season and at the beginning of the summer season. I wonder if this has anything to do with egg laying production or if this increase is seen within the broader scope of food price statistics.
## Price Analysis by Year
```{r}
eggs_long %>%
group_by(year) %>%
summarise( Min = min(Price),
Max = max(Price),
Median = median(Price),
Mean =mean(Price))
```
When looking at the data by Year we are basically given the dimensions for: large half dozen price of the cheapest month in a given year (min) extra large dozen price of the most expensive month in a given year (max) average price across 4 carton types and 12 months in a given year(mean).This does allow us to look at the price fluctuations over the years.Two drastic year shifts are from 2007 to 2008 and 2008 to 2009.