Code
library(tidyverse)
library(readxl)
library(magrittr)
library(tidyr)
library(dplyr)
library(stringr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Dirichi Umunna
March 17, 2023
In this blog post, we will explore how to work with a given data set, starting from reading the data to tidying it using pivot_longer. The data set contains information on organic egg and poultry costs in the United States, with data points representing a range of values. We will describe the data set using words and provide additional supporting information as needed, such as tables, to provide insight into the contents of the data.
#read in the dataset. we will be excluding unnecessary columns
cleanpoultry <- read_excel("_data/organiceggpoultry.xls", sheet = "Data", skip = 5, col_names = c("Date", "Xtral_12", "Xtral_6", "Large_12", "Large_6", "Delete", "Whole", "Bsbreast", "Boneinbreast", "Wholelegs", "Thighs")) %>%
select(-c(Delete))%>%
drop_na()
#view data set
head(cleanpoultry)
# A tibble: 6 × 10
Date Xtral_12 Xtral_6 Large…¹ Large_6 Whole Bsbre…² Bonei…³ Whole…⁴ Thighs
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 Jan 2004 230 132 230 126 198. 646. too few 194. too f…
2 February 230 134. 226. 128. 198. 642. too few 194. 203
3 March 230 137 225 131 209 642. too few 194. 203
4 April 234. 137 225 131 212 642. too few 194. 203
5 May 236 137 225 131 214. 642. too few 194. 203
6 June 241 137 231. 134. 216. 641 too few 202. 200.3…
# … with abbreviated variable names ¹Large_12, ²Bsbreast, ³Boneinbreast,
# ⁴Wholelegs
The data set is focused on the monthly prices paid for USDA certified organic eggs and poultry, specifically chicken, between the years 2004 and 2013. It contains information on four different types of eggs and five different types of chicken, including their prices per carton and price cents per pound. The prices were set at the delivery to first receivers. The data set is sourced from the U.S. Department of Agriculture. Pivoting this data set to make it “tidy” is often required to make it easier to analyze and visualize the data.
We will attempt to visualize the end result of our pivoting activity by anticipating our expected number of columns and rows. First we will further clean dataset in anticipation.
#separate the date into month and year
cleanpoultry <- separate (cleanpoultry, Date, into = c("Month", "Year", sep = " "))
#convert year to integer
cleanpoultry$Year <- as.integer(cleanpoultry$Year)
# fill in missing values in Year column
cleanpoultry <- fill(cleanpoultry, Year)
#remove unnecessary column
cleanpoultry <- cleanpoultry[, -which(names(cleanpoultry) == " ")]
## make all datatypes the same
cleanpoultry <- cleanpoultry %>%
mutate(Boneinbreast = as.numeric(Boneinbreast),
Thighs = as.numeric(Thighs))
cleanpoultry
# A tibble: 120 × 11
Month Year Xtral…¹ Xtral_6 Large…² Large_6 Whole Bsbre…³ Bonei…⁴ Whole…⁵
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126 198. 646. NA 194.
2 February 2004 230 134. 226. 128. 198. 642. NA 194.
3 March 2004 230 137 225 131 209 642. NA 194.
4 April 2004 234. 137 225 131 212 642. NA 194.
5 May 2004 236 137 225 131 214. 642. NA 194.
6 June 2004 241 137 231. 134. 216. 641 NA 202.
7 July 2004 241 137 234. 134. 217 642. 390. 204.
8 August 2004 241 137 234. 134. 217 642. 390. 204.
9 September 2004 241 136. 234. 130. 217 642. 390. 204.
10 October 2004 241 136. 234. 128. 217 642. 390. 204.
# … with 110 more rows, 1 more variable: Thighs <dbl>, and abbreviated variable
# names ¹Xtral_12, ²Large_12, ³Bsbreast, ⁴Boneinbreast, ⁵Wholelegs
[1] "Month" "Year" "Xtral_12" "Xtral_6" "Large_12"
[6] "Large_6" "Whole" "Bsbreast" "Boneinbreast" "Wholelegs"
[11] "Thighs"
tibble [120 × 11] (S3: tbl_df/tbl/data.frame)
$ Month : chr [1:120] "Jan" "February" "March" "April" ...
$ Year : int [1:120] 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
$ Xtral_12 : num [1:120] 230 230 230 234 236 ...
$ Xtral_6 : num [1:120] 132 134 137 137 137 ...
$ Large_12 : num [1:120] 230 226 225 225 225 ...
$ Large_6 : num [1:120] 126 128 131 131 131 ...
$ Whole : num [1:120] 198 198 209 212 214 ...
$ Bsbreast : num [1:120] 646 642 642 642 642 ...
$ Boneinbreast: num [1:120] NA NA NA NA NA ...
$ Wholelegs : num [1:120] 194 194 194 194 194 ...
$ Thighs : num [1:120] NA 203 203 203 203 ...
[1] 120
[1] 11
[1] 1080
[1] 4
# A tibble: 120 × 11
Month Year Xtral…¹ Xtral_6 Large…² Large_6 Whole Bsbre…³ Bonei…⁴ Whole…⁵
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126 198. 646. NA 194.
2 February 2004 230 134. 226. 128. 198. 642. NA 194.
3 March 2004 230 137 225 131 209 642. NA 194.
4 April 2004 234. 137 225 131 212 642. NA 194.
5 May 2004 236 137 225 131 214. 642. NA 194.
6 June 2004 241 137 231. 134. 216. 641 NA 202.
7 July 2004 241 137 234. 134. 217 642. 390. 204.
8 August 2004 241 137 234. 134. 217 642. 390. 204.
9 September 2004 241 136. 234. 130. 217 642. 390. 204.
10 October 2004 241 136. 234. 128. 217 642. 390. 204.
# … with 110 more rows, 1 more variable: Thighs <dbl>, and abbreviated variable
# names ¹Xtral_12, ²Large_12, ³Bsbreast, ⁴Boneinbreast, ⁵Wholelegs
# A tibble: 1,080 × 4
Month Year `Eggs & Poultry Type` Cost
<chr> <int> <chr> <dbl>
1 Jan 2004 Xtral_12 230
2 Jan 2004 Xtral_6 132
3 Jan 2004 Large_12 230
4 Jan 2004 Large_6 126
5 Jan 2004 Whole 198.
6 Jan 2004 Bsbreast 646.
7 Jan 2004 Boneinbreast NA
8 Jan 2004 Wholelegs 194.
9 Jan 2004 Thighs NA
10 February 2004 Xtral_12 230
# … with 1,070 more rows
In this current exercise, we successfully cleaned a data set and transformed it into a tidy format using pivot longer function. The resulting data set now has 1080 cases and 4 columns, like we anticipated. This makes it easy to analyze and manipulate the data. The process of cleaning and tidying the data set has made it more organized and readily usable for further analysis.
---
title: "Tidying and Pivoting: Challenge 3 "
author: "Dirichi Umunna"
description: "Tidy Data: Pivoting"
date: "03/17/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- Dirichi Umunna
- eggs
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
library(magrittr)
library(tidyr)
library(dplyr)
library(stringr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
In this blog post, we will explore how to work with a given data set, starting from reading the data to tidying it using pivot_longer. The data set contains information on organic egg and poultry costs in the United States, with data points representing a range of values. We will describe the data set using words and provide additional supporting information as needed, such as tables, to provide insight into the contents of the data.
```{r}
#read in the dataset. we will be excluding unnecessary columns
cleanpoultry <- read_excel("_data/organiceggpoultry.xls", sheet = "Data", skip = 5, col_names = c("Date", "Xtral_12", "Xtral_6", "Large_12", "Large_6", "Delete", "Whole", "Bsbreast", "Boneinbreast", "Wholelegs", "Thighs")) %>%
select(-c(Delete))%>%
drop_na()
#view data set
head(cleanpoultry)
```
## Briefly describe the data
The data set is focused on the monthly prices paid for USDA certified organic eggs and poultry, specifically chicken, between the years 2004 and 2013. It contains information on four different types of eggs and five different types of chicken, including their prices per carton and price cents per pound. The prices were set at the delivery to first receivers. The data set is sourced from the U.S. Department of Agriculture. Pivoting this data set to make it "tidy" is often required to make it easier to analyze and visualize the data.
## Anticipate the End Result
We will attempt to visualize the end result of our pivoting activity by anticipating our expected number of columns and rows. First we will further clean dataset in anticipation.
```{r}
#separate the date into month and year
cleanpoultry <- separate (cleanpoultry, Date, into = c("Month", "Year", sep = " "))
#convert year to integer
cleanpoultry$Year <- as.integer(cleanpoultry$Year)
# fill in missing values in Year column
cleanpoultry <- fill(cleanpoultry, Year)
#remove unnecessary column
cleanpoultry <- cleanpoultry[, -which(names(cleanpoultry) == " ")]
## make all datatypes the same
cleanpoultry <- cleanpoultry %>%
mutate(Boneinbreast = as.numeric(Boneinbreast),
Thighs = as.numeric(Thighs))
cleanpoultry
# view data
colnames(cleanpoultry)
str(cleanpoultry)
#existing rows/cases
nrow(cleanpoultry)
#existing columns/cases
ncol(cleanpoultry)
#expected rows/cases
nrow(cleanpoultry) * (ncol(cleanpoultry)-2)
# expected columns
ncol(cleanpoultry) - 9 + 2
cleanpoultry
```
## Pivot the Data
```{r}
#now we pivot the data
cleanpoultry <-pivot_longer(cleanpoultry, col = c("Xtral_12", "Xtral_6", "Large_12", "Large_6", "Whole", "Bsbreast", "Boneinbreast", "Wholelegs", "Thighs"),
names_to="Eggs & Poultry Type",
values_to = "Cost")
cleanpoultry
```
## Conclusion
In this current exercise, we successfully cleaned a data set and transformed it into a tidy format using pivot longer function. The resulting data set now has 1080 cases and 4 columns, like we anticipated. This makes it easy to analyze and manipulate the data. The process of cleaning and tidying the data set has made it more organized and readily usable for further analysis.