challenge_3
eggs_tidy.csv
Abhinav Reddy Yadatha
Author

Abhinav Reddy Yadatha

Published

May 4, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • animal_weights.csv ⭐
  • eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
  • australian_marriage*.xls ⭐⭐⭐
  • USA Households*.xlsx ⭐⭐⭐⭐
  • sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
Code
library(readr)
#Read eggs_tidy csv data
egg_tidy_data <- read_csv("_data/eggs_tidy.csv",show_col_types = FALSE)

egg_tidy_data = egg_tidy_data[-1,]
head(egg_tidy_data)
# A tibble: 6 × 6
  month     year large_half_dozen large_dozen extra_large_half_dozen
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>
1 February  2004             128.        226.                   134.
2 March     2004             131         225                    137 
3 April     2004             131         225                    137 
4 May       2004             131         225                    137 
5 June      2004             134.        231.                   137 
6 July      2004             134.        234.                   137 
# ℹ 1 more variable: extra_large_dozen <dbl>
Code
view(egg_tidy_data)

dim(egg_tidy_data)
[1] 119   6
Code
# Summary of the eggs dataset
summary(egg_tidy_data)
    month                year      large_half_dozen  large_dozen   
 Length:119         Min.   :2004   Min.   :128.5    Min.   :225.0  
 Class :character   1st Qu.:2006   1st Qu.:130.4    1st Qu.:233.5  
 Mode  :character   Median :2009   Median :174.5    Median :267.5  
                    Mean   :2009   Mean   :155.4    Mean   :254.4  
                    3rd Qu.:2011   3rd Qu.:174.5    3rd Qu.:268.0  
                    Max.   :2013   Max.   :178.0    Max.   :277.5  
 extra_large_half_dozen extra_large_dozen
 Min.   :134.5          Min.   :230.0    
 1st Qu.:136.4          1st Qu.:241.5    
 Median :185.5          Median :285.5    
 Mean   :164.5          Mean   :267.1    
 3rd Qu.:185.5          3rd Qu.:285.5    
 Max.   :188.1          Max.   :290.0    

Briefly describe the data

The dataset encompasses ten years of monthly data, specifically from January 2004 to December 2013, and tracks the mean volume of six distinct types of egg cartons. Although the dataset contains decimal values, which indicate the values are averages, it’s worth noting that cartons of eggs are sold in whole units. For instance, in February 2004, the dataset records a value of 128.5 for the volume of large half dozen sized cartons.

Anticipate the End Result

The end result would display the sizes “large” and “xlarge” in the “size” column, while the “quantity” column should include “halfdozen” and “dozen.” Finally, the corresponding values for each combination of size and quantity should be listed in a new column titled “price.”

Challenge: Describe the final dimensions

rows = 120*4 (as 4 columns pivoted) == 480 columns = 5 (4 columns made into 3 -> size quantity and price)

The final dimensions of the dataset would be 480 x 5.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Example

Code
#df<-pivot_longer(df, col = c(outgoing, incoming),
#                 names_to="trade_direction",
#                 values_to = "trade_value")
#df

egg_pivot_data<-pivot_longer(egg_tidy_data, cols = contains("dozen"),
                              names_to= c("size", "quantity"),
                              names_sep = "_",
                              values_to = "price")
head(egg_pivot_data)
# A tibble: 6 × 5
  month     year size  quantity price
  <chr>    <dbl> <chr> <chr>    <dbl>
1 February  2004 large half      128.
2 February  2004 large dozen     226.
3 February  2004 extra large     134.
4 February  2004 extra large     230 
5 March     2004 large half      131 
6 March     2004 large dozen     225 

Yes, once it is pivoted long, our resulting data are \(480x5\) - exactly what we expected!

Challenge: Pivot the Chosen Data

Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?

Any additional comments?

The new case would be highly readible and user-friendly. Additionally, it’s possible to incorporate additional data points, such as “medium” for size and “halfdozen” for quantity, to the existing dataset without requiring the addition of any new columns.