challenge_3
eggs
Tidy Data: Pivoting
Author

XinyangMao

Published

March 17, 2023

Code
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command. - aniceggpoultry.xls

Code
excel_sheets("_data/organiceggpoultry.xls")
[1] "Data"                            "Organic egg prices, 2004-13"    
[3] "Organic poultry prices, 2004-13"

Import data

Use the date col in eggs form

Code
date<-read_excel("_data/organiceggpoultry.xls",
                      sheet="Data",
                      range =cell_limits(c(6,2),c(NA,2)),
                      col_names = c("date")
                 )
date
# A tibble: 120 × 1
   date     
   <chr>    
 1 Jan 2004 
 2 February 
 3 March    
 4 April    
 5 May      
 6 June     
 7 July     
 8 August   
 9 September
10 October  
# … with 110 more rows

Import chicken data

Code
chicken_orig<-read_excel("_data/organiceggpoultry.xls",
                      sheet="Data",
                      range =cell_limits(c(6,8),c(NA,12)),
                      col_names = c("Whole",
                               "B/S_Breast", "Bone-in_Breast",
                               "Whole_Legs","Thighs")
                 )
head(chicken_orig)
# A tibble: 6 × 5
  Whole `B/S_Breast` `Bone-in_Breast` Whole_Legs Thighs 
  <dbl>        <dbl> <chr>                 <dbl> <chr>  
1  198.         646. too few                194. too few
2  198.         642. too few                194. 203    
3  209          642. too few                194. 203    
4  212          642. too few                194. 203    
5  214.         642. too few                194. 203    
6  216.         641  too few                202. 200.375

Merge date column and chicken data

Code
chicken_merged <- cbind(date,chicken_orig)
head(chicken_merged)
      date   Whole B/S_Breast Bone-in_Breast Whole_Legs  Thighs
1 Jan 2004 197.500      645.5        too few    193.500 too few
2 February 197.500      642.5        too few    193.500     203
3    March 209.000      642.5        too few    193.500     203
4    April 212.000      642.5        too few    193.500     203
5      May 214.500      642.5        too few    193.500     203
6     June 216.375      641.0        too few    201.875 200.375
Code
chicken_merged%>%
  count(date)
          date  n
1        April 10
2       August 10
3     December 10
4     February  8
5  February /1  2
6     Jan 2004  1
7     Jan 2005  1
8     Jan 2006  1
9     Jan 2007  1
10    Jan 2008  1
11    Jan 2009  1
12    Jan 2010  1
13    Jan 2011  1
14    Jan 2012  1
15    Jan 2013  1
16        July 10
17        June 10
18       March 10
19         May 10
20    November 10
21     October 10
22   September 10

Same,we need to remove the “/1”

Code
chicken<-chicken_merged%>%
  mutate(date = str_remove(date, " /1"))
head(chicken)
      date   Whole B/S_Breast Bone-in_Breast Whole_Legs  Thighs
1 Jan 2004 197.500      645.5        too few    193.500 too few
2 February 197.500      642.5        too few    193.500     203
3    March 209.000      642.5        too few    193.500     203
4    April 212.000      642.5        too few    193.500     203
5      May 214.500      642.5        too few    193.500     203
6     June 216.375      641.0        too few    201.875 200.375

Then use separate() to split the date to month and year columns.We can use fill() to fill the blank year cell.

Code
chicken<-chicken%>%
  separate(date, into=c("month", "year"), sep=" ")%>%
  fill(year)
head(chicken)
     month year   Whole B/S_Breast Bone-in_Breast Whole_Legs  Thighs
1      Jan 2004 197.500      645.5        too few    193.500 too few
2 February 2004 197.500      642.5        too few    193.500     203
3    March 2004 209.000      642.5        too few    193.500     203
4    April 2004 212.000      642.5        too few    193.500     203
5      May 2004 214.500      642.5        too few    193.500     203
6     June 2004 216.375      641.0        too few    201.875 200.375

find current and future data dimensions

We can see there are 120 rows and 7 columns in this dataset,I’ll use 2 of variables to identify a case,so I’ll pivoting 7-2 variables into a longer format.Therefore,we would expect 120*(7-2)=600 rows in the pivoted dataframe.

Code
#row number
nrow(chicken)
[1] 120
Code
#column number
ncol(chicken)
[1] 7
Code
#expect row number after pivoted
nrow(chicken) * (ncol(chicken)-2)
[1] 600
Code
#expect column number after pivoted
2 + 2
[1] 4

Pivot the Chosen Data

Converting data types

As we can see,in the chicken data sheet,there are some string “too few” exist in 5 columns,so before we pivoting the data we should use a numeric type value replace them and convert all of the character type of number into numeric type value.

Code
chicken_db<-chicken
chicken_db <- chicken_db%>%
  mutate(across(`Whole`:`Thighs`,~ifelse(. == "too few",0.0,as.numeric(.))))
head(chicken_db)
     month year   Whole B/S_Breast Bone-in_Breast Whole_Legs  Thighs
1      Jan 2004 197.500      645.5              0    193.500   0.000
2 February 2004 197.500      642.5              0    193.500 203.000
3    March 2004 209.000      642.5              0    193.500 203.000
4    April 2004 212.000      642.5              0    193.500 203.000
5      May 2004 214.500      642.5              0    193.500 203.000
6     June 2004 216.375      641.0              0    201.875 200.375

Now all of the data are avaiable,we can use pivot_longer() function to make the dataset be longer.There are 7 variables in this dataset,but we need to use other 2 variavles to instead 5 of them.

Code
chicken_long<-chicken_db%>%
  pivot_longer(cols=c(`Whole`:`Thighs`), 
               names_to = "ChickenType",
               values_to = "Price"
  )
head(chicken_long)
# A tibble: 6 × 4
  month    year  ChickenType    Price
  <chr>    <chr> <chr>          <dbl>
1 Jan      2004  Whole           198.
2 Jan      2004  B/S_Breast      646.
3 Jan      2004  Bone-in_Breast    0 
4 Jan      2004  Whole_Legs      194.
5 Jan      2004  Thighs            0 
6 February 2004  Whole           198.

Yes, once it is pivoted long, our resulting data are \(600x4\) - exactly what we expected!