Challenge 3 Instructions

challenge_3

eggs_tidy.csv

Author

Roy Yoon

Published

August 17, 2022

Code

library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Code

eggs<-read_csv("_data/eggs_tidy.csv")

eggs

# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows

About eggs_tidy.csv data set

Code

eggs

# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows

Code

colnames(eggs)

[1] "month"                  "year"                   "large_half_dozen"      
[4] "large_dozen"            "extra_large_half_dozen" "extra_large_dozen"

Code

dim(eggs)

[1] 120   6

The eggs data set has 120 rows and 6 columns.

The data examines the monthly price of different egg sizes and amounts from 2004 to 2013.

Anticipate the End Result

Example: find current and future data dimensions

Lets see if this works with a simple example.

Code

df<-eggs

df

# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
# ℹ Use `print(n = ...)` to see more rows

Code

#existing rows/cases
nrow(df)

[1] 120

Code

#existing columns/cases
ncol(df)

[1] 6

Code

#expected rows/cases
nrow(df) * (ncol(df)-2)

[1] 480

Code

# expected columns 
5

[1] 5

eggs data set pivot longer (month, year, size, amount, price)

Code

eggs <- rename(eggs, large_halfdozen = large_half_dozen, xlarge_halfdozen = extra_large_half_dozen, xlarge_dozen = extra_large_dozen)

eggs_tidy_longer<- eggs%>%
  pivot_longer(cols=contains("large"),
               names_to = c("size", "amount"),
               names_sep="_",
               values_to = "price"
  )
eggs_tidy_longer

# A tibble: 480 × 5
   month     year size   amount    price
   <chr>    <dbl> <chr>  <chr>     <dbl>
 1 January   2004 large  halfdozen  126 
 2 January   2004 large  dozen      230 
 3 January   2004 xlarge halfdozen  132 
 4 January   2004 xlarge dozen      230 
 5 February  2004 large  halfdozen  128.
 6 February  2004 large  dozen      226.
 7 February  2004 xlarge halfdozen  134.
 8 February  2004 xlarge dozen      230 
 9 March     2004 large  halfdozen  131 
10 March     2004 large  dozen      225 
# … with 470 more rows
# ℹ Use `print(n = ...)` to see more rows

New variable the size and amount are now separated and independent variables of each other and price is also aviable as its own variable

Describing Pivoted Data

Code

dim(eggs_tidy_longer)

[1] 480   5

Code

colnames(eggs_tidy_longer)

[1] "month"  "year"   "size"   "amount" "price"

The eggs_tidy_longer data set contains 480 rows and 5 columns.

The columns distinguish month, year, size, amount, and price as variable.

The variables organized in this manner allows us to better understand and analyze how the variable are indepent from each other when looking at, size, amount, and price.