Challenge 3

challenge_3

animal_weights

eggs

australian_marriage

usa_households

sce_labor

Tidy Data: Pivoting

Author

Prachiti Parkar

Published

March 22, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

animal_weights.csv ⭐
eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
australian_marriage*.xls ⭐⭐⭐
USA Households*.xlsx ⭐⭐⭐⭐
sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟

Code

library(readr)
eggs_tidy_data <- read_csv("_data/eggs_tidy.csv",col_names = c("month", "year","xlarge_dozen",
                               "xlarge_halfdozen", "large_dozen",
                               "large_halfdozen"))
eggs_tidy_data = eggs_tidy_data[-1,]
view(eggs_tidy_data)
head(eggs_tidy_data)

# A tibble: 6 × 6
  month    year  xlarge_dozen xlarge_halfdozen large_dozen large_halfdozen
  <chr>    <chr> <chr>        <chr>            <chr>       <chr>          
1 January  2004  126          230              132         230            
2 February 2004  128.5        226.25           134.5       230            
3 March    2004  131          225              137         230            
4 April    2004  131          225              137         234.5          
5 May      2004  131          225              137         236            
6 June     2004  133.5        231.375          137         241

Code

dim(eggs_tidy_data)

[1] 120   6

Code

# Summary of the dataset
summary(eggs_tidy_data)

    month               year           xlarge_dozen       xlarge_halfdozen  
 Length:120         Length:120         Length:120         Length:120        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 large_dozen        large_halfdozen   
 Length:120         Length:120        
 Class :character   Class :character  
 Mode  :character   Mode  :character

Briefly describe the data

Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”

The dataset contains 6 columns and 120 rows. The dataset can be pivoted to size (extra large and large) and quantity (dozen and half_dozen) and the values shifted to prices. This would be better since it would be better to read data as per quantity and size.

Anticipate the End Result

The end result for our dataset would be to see large and xlarge under size column and halfdozen and dozen under quantity column and its respective values under a new column called price.

Example: find current and future data dimensions

rows = 120*4 (4 columns pivoted) = 480 columns = 5 (4 columns made into 3 -> size quantity and price)

Challenge: Describe the final dimensions

Document your work here.

The final dimensions would be 480*5

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Example

Code

#df<-pivot_longer(df, col = c(outgoing, incoming),
#                 names_to="trade_direction",
#                 values_to = "trade_value")
#df

eggs_pivot_data<-pivot_longer(eggs_tidy_data, cols = contains("dozen"),
                              names_to= c("size", "quantity"),
                              names_sep = "_",
                              values_to = "price")


head(eggs_pivot_data)

# A tibble: 6 × 5
  month    year  size   quantity  price 
  <chr>    <chr> <chr>  <chr>     <chr> 
1 January  2004  xlarge dozen     126   
2 January  2004  xlarge halfdozen 230   
3 January  2004  large  dozen     132   
4 January  2004  large  halfdozen 230   
5 February 2004  xlarge dozen     128.5 
6 February 2004  xlarge halfdozen 226.25

Yes, once it is pivoted long, our resulting data are \(480x5\) - exactly what we expected!

Challenge: Pivot the Chosen Data

Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?

The new case would be easily readable. Also one can add medium (size) and halfdozen (quantity) and it would be easy to add to our dataset without any addition of columns.

Any additional comments?