Code
library(tidyverse)
library(dplyr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Kristin Abijaoude
September 25, 2022
Today, I will be tidying and pivoting eggs_tidy.csv.
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# A tibble: 6 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_lar…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
# … with abbreviated variable name ¹extra_large_dozen
Here, we get the first 6 rows of the dataset eggs_tidy.csv. From what I see, this dataset records the average price of eggs sold per carton in from 2004 to 2013.
Okay, what a mess. This is difficult to read and interpret. Let’s make it tidy first!
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
This is much better to read than the original format, but work still needs to be done.
# A tibble: 480 × 4
month year `Eggs Sold` `Price sold`
<chr> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 February 2004 large_half_dozen 128.
3 March 2004 large_half_dozen 131
4 April 2004 large_half_dozen 131
5 May 2004 large_half_dozen 131
6 June 2004 large_half_dozen 134.
7 July 2004 large_half_dozen 134.
8 August 2004 large_half_dozen 134.
9 September 2004 large_half_dozen 130.
10 October 2004 large_half_dozen 128.
# … with 470 more rows
[1] 120
[1] 6
[1] 480
[1] 4
From there, I can expect the new amount of rows and columns for the eggs dataset.
eggs_new <- eggs %>%
mutate("Large Half Dozen per Cart Sold" = large_half_dozen / 100,
"Large Dozen per Cart Sold" = large_dozen / 100,
"Extra Large Half Dozen per Cart Sold" = extra_large_half_dozen / 100,
"Extra Large Dozen per Cart Sold" = extra_large_dozen / 100)
#| label: Replace Old Columns with New Ones
eggs_new1 <- select(eggs_new,-c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen))
eggs_new1
# A tibble: 120 × 6
month year `Large Half Dozen per Cart Sold` Large Doze…¹ Extra…² Extra…³
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 1.26 2.3 1.32 2.3
2 February 2004 1.28 2.26 1.34 2.3
3 March 2004 1.31 2.25 1.37 2.3
4 April 2004 1.31 2.25 1.37 2.35
5 May 2004 1.31 2.25 1.37 2.36
6 June 2004 1.34 2.31 1.37 2.41
7 July 2004 1.34 2.34 1.37 2.41
8 August 2004 1.34 2.34 1.37 2.41
9 September 2004 1.30 2.34 1.36 2.41
10 October 2004 1.28 2.34 1.36 2.41
# … with 110 more rows, and abbreviated variable names
# ¹`Large Dozen per Cart Sold`, ²`Extra Large Half Dozen per Cart Sold`,
# ³`Extra Large Dozen per Cart Sold`
I calculated the price of one cart sold in dollars with the mutate() command and removed the old columns to make room with the new ones.
# A tibble: 480 × 4
month year size price
<chr> <dbl> <chr> <dbl>
1 January 2004 Large Half Dozen per Cart Sold 1.26
2 January 2004 Large Dozen per Cart Sold 2.3
3 January 2004 Extra Large Half Dozen per Cart Sold 1.32
4 January 2004 Extra Large Dozen per Cart Sold 2.3
5 February 2004 Large Half Dozen per Cart Sold 1.28
6 February 2004 Large Dozen per Cart Sold 2.26
7 February 2004 Extra Large Half Dozen per Cart Sold 1.34
8 February 2004 Extra Large Dozen per Cart Sold 2.3
9 March 2004 Large Half Dozen per Cart Sold 1.31
10 March 2004 Large Dozen per Cart Sold 2.25
# … with 470 more rows
Okay, that’s (sort of) better! As you can see, the larger the eggs are, the more expensive they will be. The same concept applies to quantity, obviously, but you get more bang for your buck.
---
title: "Challenge 3 with R"
author: "Kristin Abijaoude"
desription: "Tidy Data: Pivoting"
date: "09/25/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs
- kristin_abijaoude
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
Today, I will be tidying and pivoting eggs_tidy.csv.
```{r}
#| label: Reading the Dataset
eggs<-read_csv("_data/eggs_tidy.csv")
eggs
```
```{r}
#| label: Return head of data frame
head(eggs)
```
Here, we get the first 6 rows of the dataset eggs_tidy.csv. From what I see, this dataset records the average price of eggs sold per carton in from 2004 to 2013.
```{r}
#| label: Let's create a table, shall we?
is_tibble(eggs)
```
Okay, what a mess. This is difficult to read and interpret. Let's make it tidy first!
```{r}
as_tibble(eggs)
```
This is much better to read than the original format, but work still needs to be done.
```{r}
eggs %>%
gather("large_half_dozen", "large_dozen", "extra_large_half_dozen", "extra_large_dozen", key = "Eggs Sold", value = "Price sold")
```
```{r}
#| label: tibbling
eggs_tibble<-tibble(eggs)
#| label: Existing rows
nrow(eggs_tibble)
#| Existing columns
ncol(eggs_tibble)
#| expected rows/cases
nrow(eggs_tibble) * (ncol(eggs_tibble)-2)
#| expected columns
4
```
From there, I can expect the new amount of rows and columns for the eggs dataset.
## Pivot the Data
```{r}
#| label: Tidying up Columns
eggs_new <- eggs %>%
mutate("Large Half Dozen per Cart Sold" = large_half_dozen / 100,
"Large Dozen per Cart Sold" = large_dozen / 100,
"Extra Large Half Dozen per Cart Sold" = extra_large_half_dozen / 100,
"Extra Large Dozen per Cart Sold" = extra_large_dozen / 100)
#| label: Replace Old Columns with New Ones
eggs_new1 <- select(eggs_new,-c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen))
eggs_new1
```
I calculated the price of one cart sold in dollars with the mutate() command and removed the old columns to make room with the new ones.
```{r}
eggs_pivot <- eggs_new1%>%
pivot_longer(cols=contains("large"),
names_to = "size",
values_to = "price"
)
eggs_pivot
```
Okay, that's (sort of) better! As you can see, the larger the eggs are, the more expensive they will be. The same concept applies to quantity, obviously, but you get more bang for your buck.