Code
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Kristin Abijaoude
September 25, 2022
Today, I will be tidying and pivoting eggs_tidy.csv.
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen# A tibble: 6 × 6
  month     year large_half_dozen large_dozen extra_large_half_dozen extra_lar…¹
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>       <dbl>
1 January   2004             126         230                    132         230 
2 February  2004             128.        226.                   134.        230 
3 March     2004             131         225                    137         230 
4 April     2004             131         225                    137         234.
5 May       2004             131         225                    137         236 
6 June      2004             134.        231.                   137         241 
# … with abbreviated variable name ¹extra_large_dozenHere, we get the first 6 rows of the dataset eggs_tidy.csv. From what I see, this dataset records the average price of eggs sold per carton in from 2004 to 2013.
Okay, what a mess. This is difficult to read and interpret. Let’s make it tidy first!
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozenThis is much better to read than the original format, but work still needs to be done.
# A tibble: 480 × 4
   month      year `Eggs Sold`      `Price sold`
   <chr>     <dbl> <chr>                   <dbl>
 1 January    2004 large_half_dozen         126 
 2 February   2004 large_half_dozen         128.
 3 March      2004 large_half_dozen         131 
 4 April      2004 large_half_dozen         131 
 5 May        2004 large_half_dozen         131 
 6 June       2004 large_half_dozen         134.
 7 July       2004 large_half_dozen         134.
 8 August     2004 large_half_dozen         134.
 9 September  2004 large_half_dozen         130.
10 October    2004 large_half_dozen         128.
# … with 470 more rows[1] 120[1] 6[1] 480[1] 4From there, I can expect the new amount of rows and columns for the eggs dataset.
eggs_new <- eggs %>%
  mutate("Large Half Dozen per Cart Sold" = large_half_dozen / 100,
         "Large Dozen per Cart Sold" = large_dozen / 100,
         "Extra Large Half Dozen per Cart Sold" = extra_large_half_dozen / 100,
         "Extra Large Dozen per Cart Sold" = extra_large_dozen / 100)
#| label: Replace Old Columns with New Ones
eggs_new1 <- select(eggs_new,-c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen))
eggs_new1# A tibble: 120 × 6
   month      year `Large Half Dozen per Cart Sold` Large Doze…¹ Extra…² Extra…³
   <chr>     <dbl>                            <dbl>        <dbl>   <dbl>   <dbl>
 1 January    2004                             1.26         2.3     1.32    2.3 
 2 February   2004                             1.28         2.26    1.34    2.3 
 3 March      2004                             1.31         2.25    1.37    2.3 
 4 April      2004                             1.31         2.25    1.37    2.35
 5 May        2004                             1.31         2.25    1.37    2.36
 6 June       2004                             1.34         2.31    1.37    2.41
 7 July       2004                             1.34         2.34    1.37    2.41
 8 August     2004                             1.34         2.34    1.37    2.41
 9 September  2004                             1.30         2.34    1.36    2.41
10 October    2004                             1.28         2.34    1.36    2.41
# … with 110 more rows, and abbreviated variable names
#   ¹`Large Dozen per Cart Sold`, ²`Extra Large Half Dozen per Cart Sold`,
#   ³`Extra Large Dozen per Cart Sold`I calculated the price of one cart sold in dollars with the mutate() command and removed the old columns to make room with the new ones.
# A tibble: 480 × 4
   month     year size                                 price
   <chr>    <dbl> <chr>                                <dbl>
 1 January   2004 Large Half Dozen per Cart Sold        1.26
 2 January   2004 Large Dozen per Cart Sold             2.3 
 3 January   2004 Extra Large Half Dozen per Cart Sold  1.32
 4 January   2004 Extra Large Dozen per Cart Sold       2.3 
 5 February  2004 Large Half Dozen per Cart Sold        1.28
 6 February  2004 Large Dozen per Cart Sold             2.26
 7 February  2004 Extra Large Half Dozen per Cart Sold  1.34
 8 February  2004 Extra Large Dozen per Cart Sold       2.3 
 9 March     2004 Large Half Dozen per Cart Sold        1.31
10 March     2004 Large Dozen per Cart Sold             2.25
# … with 470 more rowsOkay, that’s (sort of) better! As you can see, the larger the eggs are, the more expensive they will be. The same concept applies to quantity, obviously, but you get more bang for your buck.
---
title: "Challenge 3 with R"
author: "Kristin Abijaoude"
desription: "Tidy Data: Pivoting"
date: "09/25/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - eggs
  - kristin_abijaoude
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
Today, I will be tidying and pivoting eggs_tidy.csv. 
```{r}
#| label: Reading the Dataset
eggs<-read_csv("_data/eggs_tidy.csv")
eggs
```
```{r}
#| label: Return head of data frame
head(eggs)
```
Here, we get the first 6 rows of the dataset eggs_tidy.csv. From what I see, this dataset records the average price of eggs sold per carton in from 2004 to 2013.
```{r}
#| label:  Let's create a table, shall we?
is_tibble(eggs)
```
Okay, what a mess. This is difficult to read and interpret. Let's make it tidy first!
```{r}
as_tibble(eggs)
```
This is much better to read than the original format, but work still needs to be done. 
```{r}
eggs %>%
  gather("large_half_dozen", "large_dozen", "extra_large_half_dozen", "extra_large_dozen", key = "Eggs Sold", value = "Price sold")
```
```{r}
#| label: tibbling
eggs_tibble<-tibble(eggs)
#| label: Existing rows
nrow(eggs_tibble)
#| Existing columns 
ncol(eggs_tibble)
#| expected rows/cases
nrow(eggs_tibble) * (ncol(eggs_tibble)-2)
#| expected columns
4
```
From there, I can expect the new amount of rows and columns for the eggs dataset. 
## Pivot the Data
```{r}
#| label: Tidying up Columns
eggs_new <- eggs %>%
  mutate("Large Half Dozen per Cart Sold" = large_half_dozen / 100,
         "Large Dozen per Cart Sold" = large_dozen / 100,
         "Extra Large Half Dozen per Cart Sold" = extra_large_half_dozen / 100,
         "Extra Large Dozen per Cart Sold" = extra_large_dozen / 100)
#| label: Replace Old Columns with New Ones
eggs_new1 <- select(eggs_new,-c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen))
eggs_new1
```
I calculated the price of one cart sold in dollars with the mutate() command and removed the old columns to make room with the new ones. 
```{r}
eggs_pivot <- eggs_new1%>%
  pivot_longer(cols=contains("large"),
               names_to = "size",
               values_to = "price"
  )
eggs_pivot
```
Okay, that's (sort of) better! As you can see, the larger the eggs are, the more expensive they will be. The same concept applies to quantity, obviously, but you get more bang for your buck.