Code
library(tidyverse)
library(googlesheets4)
library(readxl)
library(lubridate)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Keith Martin
August 18, 2022
For this challenge, I re-used the data set that I used in challenge three. In that challenge I had previously tidied the data and conducted a basic sanity check to ensure the date was entered correctly.
First I loaded in the Organice Poultry Data set.
[1] "Data" "Organic egg prices, 2004-13"
[3] "Organic poultry prices, 2004-13"
This data set is tracking the prices of various quantities of eggs and chicken cuts from the years 2004 to 2013.
In this challenge, due to outside time restraints, i re-used the data that I made tidy last week. However, this this challenge, I leveraged the lubridate functions to show a better picture of the date.
raw_poulty<-read_excel("_data/organiceggpoultry.xls",
sheet = "Data",
range =cell_limits(c(6,2),c(NA,12)),
col_names = c("date", "XL_Dozen", "XL_1/2_Doz.", "L_Dozen", "L_1/2_Doz", "Remove", "Ckn_Whole", "Ckn_BS_Breast", "Ckn_Bone_Breast", "Ckn_Whole_legs", "Ckn_Thighs"),
)
raw_poulty %>%
select(-c(Remove))
# A tibble: 120 × 10
date XL_Do…¹ XL_1/…² L_Dozen L_1/2…³ Ckn_W…⁴ Ckn_B…⁵ Ckn_B…⁶ Ckn_W…⁷ Ckn_T…⁸
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 Jan … 230 132 230 126 198. 646. too few 194. too few
2 Febr… 230 134. 226. 128. 198. 642. too few 194. 203
3 March 230 137 225 131 209 642. too few 194. 203
4 April 234. 137 225 131 212 642. too few 194. 203
5 May 236 137 225 131 214. 642. too few 194. 203
6 June 241 137 231. 134. 216. 641 too few 202. 200.375
7 July 241 137 234. 134. 217 642. 390.5 204. 199.5
8 Augu… 241 137 234. 134. 217 642. 390.5 204. 199.5
9 Sept… 241 136. 234. 130. 217 642. 390.5 204. 199.5
10 Octo… 241 136. 234. 128. 217 642. 390.5 204. 199.5
# … with 110 more rows, and abbreviated variable names ¹XL_Dozen,
# ²`XL_1/2_Doz.`, ³`L_1/2_Doz`, ⁴Ckn_Whole, ⁵Ckn_BS_Breast, ⁶Ckn_Bone_Breast,
# ⁷Ckn_Whole_legs, ⁸Ckn_Thighs
In this data, there we’re a few things that needed to be cleaned up. There was a typo or a “/1” in the data that needed to be mutated and removed. Additionally, I separated the month and date into two separate columns.
### remove that /1 from the February date
raw_poulty_clean <-raw_poulty %>%
mutate(date = str_remove(date, " /1"))
### Separate the month and the year, fill the years in for the rest of the months
raw_poulty_clean<-raw_poulty_clean %>%
separate(date, into=c("month", "year"), sep =" ")%>%
fill(year)
raw_poulty_clean
# A tibble: 120 × 12
month year XL_Do…¹ XL_1/…² L_Dozen L_1/2…³ Remove Ckn_W…⁴ Ckn_B…⁵ Ckn_B…⁶
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <chr>
1 Jan 2004 230 132 230 126 NA 198. 646. too few
2 February 2004 230 134. 226. 128. NA 198. 642. too few
3 March 2004 230 137 225 131 NA 209 642. too few
4 April 2004 234. 137 225 131 NA 212 642. too few
5 May 2004 236 137 225 131 NA 214. 642. too few
6 June 2004 241 137 231. 134. NA 216. 641 too few
7 July 2004 241 137 234. 134. NA 217 642. 390.5
8 August 2004 241 137 234. 134. NA 217 642. 390.5
9 Septemb… 2004 241 136. 234. 130. NA 217 642. 390.5
10 October 2004 241 136. 234. 128. NA 217 642. 390.5
# … with 110 more rows, 2 more variables: Ckn_Whole_legs <dbl>,
# Ckn_Thighs <chr>, and abbreviated variable names ¹XL_Dozen, ²`XL_1/2_Doz.`,
# ³`L_1/2_Doz`, ⁴Ckn_Whole, ⁵Ckn_BS_Breast, ⁶Ckn_Bone_Breast
Now I decided to use lubridate for month and year since this data did not include any data that indicated days. I had it added to its own row. It does not add much in terms of story to this data set, but it is more efficent and visually appealing.
# A tibble: 120 × 3
month year date
<chr> <chr> <date>
1 Jan 2004 2004-01-01
2 February 2004 2004-02-01
3 March 2004 2004-03-01
4 April 2004 2004-04-01
5 May 2004 2004-05-01
6 June 2004 2004-06-01
7 July 2004 2004-07-01
8 August 2004 2004-08-01
9 September 2004 2004-09-01
10 October 2004 2004-10-01
# … with 110 more rows
Before I could do my sanity check on the eggs and the chicken, I needed to pivot longer. Before I could do that, I needed to recode the “too few” entries into 0. However, changing them to “0” was not enough I needed to re-code them as integers as well.
# A tibble: 120 × 13
month year XL_Do…¹ XL_1/…² L_Dozen L_1/2…³ Remove Ckn_W…⁴ Ckn_B…⁵ Ckn_B…⁶
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <chr>
1 Jan 2004 230 132 230 126 NA 198. 646. 0
2 February 2004 230 134. 226. 128. NA 198. 642. 0
3 March 2004 230 137 225 131 NA 209 642. 0
4 April 2004 234. 137 225 131 NA 212 642. 0
5 May 2004 236 137 225 131 NA 214. 642. 0
6 June 2004 241 137 231. 134. NA 216. 641 0
7 July 2004 241 137 234. 134. NA 217 642. 390.5
8 August 2004 241 137 234. 134. NA 217 642. 390.5
9 Septemb… 2004 241 136. 234. 130. NA 217 642. 390.5
10 October 2004 241 136. 234. 128. NA 217 642. 390.5
# … with 110 more rows, 3 more variables: Ckn_Whole_legs <dbl>,
# Ckn_Thighs <chr>, date <date>, and abbreviated variable names ¹XL_Dozen,
# ²`XL_1/2_Doz.`, ³`L_1/2_Doz`, ⁴Ckn_Whole, ⁵Ckn_BS_Breast, ⁶Ckn_Bone_Breast
Below I recoded the 0s into integers rather than characters.
# A tibble: 120 × 13
month year XL_Do…¹ XL_1/…² L_Dozen L_1/2…³ Remove Ckn_W…⁴ Ckn_B…⁵ Ckn_B…⁶
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126 NA 198. 646. 0
2 February 2004 230 134. 226. 128. NA 198. 642. 0
3 March 2004 230 137 225 131 NA 209 642. 0
4 April 2004 234. 137 225 131 NA 212 642. 0
5 May 2004 236 137 225 131 NA 214. 642. 0
6 June 2004 241 137 231. 134. NA 216. 641 0
7 July 2004 241 137 234. 134. NA 217 642. 390.
8 August 2004 241 137 234. 134. NA 217 642. 390.
9 Septemb… 2004 241 136. 234. 130. NA 217 642. 390.
10 October 2004 241 136. 234. 128. NA 217 642. 390.
# … with 110 more rows, 3 more variables: Ckn_Whole_legs <dbl>,
# Ckn_Thighs <dbl>, date <date>, and abbreviated variable names ¹XL_Dozen,
# ²`XL_1/2_Doz.`, ³`L_1/2_Doz`, ⁴Ckn_Whole, ⁵Ckn_BS_Breast, ⁶Ckn_Bone_Breast
I then pivoted longer to include the previous egg/chicken types as rows rather than columns so it was easier to do a sanity check.
# A tibble: 1,080 × 6
month year Remove date `eggType/cknType` avgPrice
<chr> <chr> <lgl> <date> <chr> <dbl>
1 Jan 2004 NA 2004-01-01 XL_Dozen 230
2 Jan 2004 NA 2004-01-01 XL_1/2_Doz. 132
3 Jan 2004 NA 2004-01-01 L_Dozen 230
4 Jan 2004 NA 2004-01-01 L_1/2_Doz 126
5 Jan 2004 NA 2004-01-01 Ckn_Whole 198.
6 Jan 2004 NA 2004-01-01 Ckn_BS_Breast 646.
7 Jan 2004 NA 2004-01-01 Ckn_Bone_Breast 0
8 Jan 2004 NA 2004-01-01 Ckn_Whole_legs 194.
9 Jan 2004 NA 2004-01-01 Ckn_Thighs 0
10 February 2004 NA 2004-02-01 XL_Dozen 230
# … with 1,070 more rows
Below I included the basic summary statistics like median, mode, min, max and standard deviation to show that the lubridate function and the pivot longer function worked as intended.
# A tibble: 120 × 6
date sd_year max_year min_year avg_year med_year
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2004-01-01 190. 646. 0 195. 194.
2 2004-02-01 175. 642. 0 217. 198.
3 2004-03-01 174. 642. 0 219 203
4 2004-04-01 174. 642. 0 220. 203
5 2004-05-01 174. 642. 0 220. 203
6 2004-06-01 174. 641 0 222. 202.
7 2004-07-01 160. 642. 134. 266. 217
8 2004-08-01 160. 642. 134. 266. 217
9 2004-09-01 160. 642. 130. 266. 217
10 2004-10-01 160. 642. 128. 266. 217
# … with 110 more rows
---
title: "Challenge 4 Poultry and Eggs"
author: "Keith Martin"
description: "More data wrangling: pivoting"
date: "08/18/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- eggs
-
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(googlesheets4)
library(readxl)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
For this challenge, I re-used the data set that I used in challenge three. In that challenge I had previously tidied the data and conducted a basic sanity check to ensure the date was entered correctly.
## Read in data
First I loaded in the Organice Poultry Data set.
```{r}
### Loading in the libraries of the excel files
excel_sheets("_data/organiceggpoultry.xls")
```
### Briefly describe the data
This data set is tracking the prices of various quantities of eggs and chicken cuts from the years 2004 to 2013.
## Tidy Data (as needed)
In this challenge, due to outside time restraints, i re-used the data that I made tidy last week. However, this this challenge, I leveraged the lubridate functions to show a better picture of the date.
```{r}
raw_poulty<-read_excel("_data/organiceggpoultry.xls",
sheet = "Data",
range =cell_limits(c(6,2),c(NA,12)),
col_names = c("date", "XL_Dozen", "XL_1/2_Doz.", "L_Dozen", "L_1/2_Doz", "Remove", "Ckn_Whole", "Ckn_BS_Breast", "Ckn_Bone_Breast", "Ckn_Whole_legs", "Ckn_Thighs"),
)
raw_poulty %>%
select(-c(Remove))
```
## Identify variables that need to be mutated
In this data, there we're a few things that needed to be cleaned up. There was a typo or a "/1" in the data that needed to be mutated and removed. Additionally, I separated the month and date into two separate columns.
```{r}
### remove that /1 from the February date
raw_poulty_clean <-raw_poulty %>%
mutate(date = str_remove(date, " /1"))
### Separate the month and the year, fill the years in for the rest of the months
raw_poulty_clean<-raw_poulty_clean %>%
separate(date, into=c("month", "year"), sep =" ")%>%
fill(year)
raw_poulty_clean
```
Now I decided to use lubridate for month and year since this data did not include any data that indicated days. I had it added to its own row. It does not add much in terms of story to this data set, but it is more efficent and visually appealing.
```{r}
raw_poulty_clean<-raw_poulty_clean%>%
mutate(date = str_c(month, year, sep = " "), ###combine month and year into date column
date = my(date) ### lubridate will automatically make them into date format
)
select(raw_poulty_clean, month, year, date)
```
Before I could do my sanity check on the eggs and the chicken, I needed to pivot longer. Before I could do that, I needed to recode the "too few" entries into 0. However, changing them to "0" was not enough I needed to re-code them as integers as well.
```{r}
ckn_edited<- raw_poulty_clean %>%
mutate(Ckn_Bone_Breast = recode(Ckn_Bone_Breast, `too few` = "0"),
Ckn_Thighs = recode(Ckn_Thighs, `too few`="0"))
ckn_edited
```
Below I recoded the 0s into integers rather than characters.
```{r}
ckn_edited$Ckn_Bone_Breast<-as.numeric(ckn_edited$Ckn_Bone_Breast)
ckn_edited$Ckn_Thighs<-as.numeric(ckn_edited$Ckn_Thighs)
ckn_edited
```
I then pivoted longer to include the previous egg/chicken types as rows rather than columns so it was easier to do a sanity check.
```{r}
eggs_longer<- pivot_longer(ckn_edited, cols=c("XL_Dozen", "XL_1/2_Doz.", "L_Dozen", "L_1/2_Doz", "Ckn_Whole", "Ckn_BS_Breast", "Ckn_Bone_Breast", "Ckn_Whole_legs", "Ckn_Thighs"),
names_to = "eggType/cknType",
values_to = "avgPrice"
)
eggs_longer
```
Below I included the basic summary statistics like median, mode, min, max and standard deviation to show that the lubridate function and the pivot longer function worked as intended.
```{r}
eggs_longer%>%
group_by(date)%>%
summarise (
sd_year = sd (avgPrice, na.rm=TRUE),
max_year = max(avgPrice, na.rm = TRUE),
min_year = min(avgPrice, na.rm = TRUE),
avg_year = mean (avgPrice, na.rm = TRUE),
med_year = median(avgPrice, na.rm = TRUE)
)
```