Challenge 3

challenge_3

Tenzin Latoe

eggs

Tidy Data: Pivoting

Author

Tenzin Latoe

Published

July 7, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

eggs_tidy.csv ⭐⭐

Code

Eggs <- read_csv("_data/eggs_tidy.csv")
head(Eggs)

# A tibble: 6 × 6
  month     year large_half_dozen large_dozen extra_large_half_dozen
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>
1 January   2004             126         230                    132 
2 February  2004             128.        226.                   134.
3 March     2004             131         225                    137 
4 April     2004             131         225                    137 
5 May       2004             131         225                    137 
6 June      2004             134.        231.                   137 
# ℹ 1 more variable: extra_large_dozen <dbl>

Briefly describe the data

The Eggs data set shows the price of large half dozen, large dozen, extra large half dozen, and extra large dozen eggs. The report breaks down the results per month from January 2004 till December 2013. I plan to pivot the data to reorganize the structure of the data so that it can be tidier for analysis.

Anticipate the End Result

The end result after pivoting the data will condense the four different different types of eggs under one column which is result in the data set with less columns, and more rows.

Challenge: Describe the final dimensions

Code

#Dimensions
dim(Eggs)

[1] 120   6

Code

#existing rows/cases
nrow(Eggs)

[1] 120

Code

#existing columns/cases
ncol(Eggs)

[1] 6

The data consists of 6 rows and 120 columns.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Challenge: Pivot the Chosen Data

Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?

Code

longer<- pivot_longer(Eggs, cols=c(-month,-year),
                     names_to = "package",
                     values_to = "price")
longer

# A tibble: 480 × 4
   month     year package                price
   <chr>    <dbl> <chr>                  <dbl>
 1 January   2004 large_half_dozen        126 
 2 January   2004 large_dozen             230 
 3 January   2004 extra_large_half_dozen  132 
 4 January   2004 extra_large_dozen       230 
 5 February  2004 large_half_dozen        128.
 6 February  2004 large_dozen             226.
 7 February  2004 extra_large_half_dozen  134.
 8 February  2004 extra_large_dozen       230 
 9 March     2004 large_half_dozen        131 
10 March     2004 large_dozen             225 
# ℹ 470 more rows

Running the code above resulted in transforming the data set longer by gathering the four different types of eggs under a single column renamed “package”, and creating a new column named “price” which gathered all the the prices under that column.