Challenge 3 by Jinxia Niu

challenge_3
Jinxia Niu
eggs_tidy.csv
Tidy Data: Pivoting
Author

Jinxia Niu

Published

March 17, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • animal_weights.csv ⭐
  • eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
  • australian_marriage*.xls ⭐⭐⭐
  • USA Households*.xlsx ⭐⭐⭐⭐
  • sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
Code
library(tidyverse)
 eggs <- read_csv("_data/eggs_tidy.csv")

Briefly describe the data

The data has 120 rows with 6 variables (2 grouping and 4 values) .The data reports the average price per carton paid to the farmer or producer for organic eggs (and organic chicken), reported monthly from 2004 to 2013. Average price is reported by carton type, which can vary in both size (x-large or large) and quantity (half-dozen or dozen.)

Anticipate the End Result

we can pivot longer - changing our data from 120 rows with 6 variables (2 grouping and 4 values) to 480 rows of 4 variables

The first step in pivoting the data is to try to come up with a concrete vision of what the end product should look like - that way you will know whether or not your pivoting was successful.

`## Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Code
eggs_new <- eggs%>%
  pivot_longer(cols=contains("large"),
               names_to = c("size","quantity"),
               names_sep = "_",
               values_to = "Price"
               )

print(eggs_new)
# A tibble: 480 × 5
   month     year size  quantity Price
   <chr>    <dbl> <chr> <chr>    <dbl>
 1 January   2004 large half      126 
 2 January   2004 large dozen     230 
 3 January   2004 extra large     132 
 4 January   2004 extra large     230 
 5 February  2004 large half      128.
 6 February  2004 large dozen     226.
 7 February  2004 extra large     134.
 8 February  2004 extra large     230 
 9 March     2004 large half      131 
10 March     2004 large dozen     225 
# … with 470 more rows

Challenge: Pivot the Chosen Data

Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?

#Any additional comments? How to tell R that names_sep = “” is the second ”” such as “extra_large_dozen” instead of the first”_“?