I’m sharing some example code for pivot_longer using the eggs data. Enjoy!
This is example code for using the pivot functions in R. Several of the government data sources include tabular data that really need to be pivoted into a dataset in which a “case” is some combination of the grouping variables (the rows and columns in the table) alongside the appropriate statistical value(s) in the table (e.g., counts or average costs). Lets start with the easy to read in eggs_tidy.csv just so we can focus on the pivoting function.
eggs<-read_csv("../../_data/eggs_tidy.csv", show_col_types = FALSE)
eggs
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# … with 110 more rows, and 1 more variable: extra_large_dozen <dbl>
Looking at the data, we can see that each case consists of a year-month combination (e.g., January 2004), while the values are the average price (in cents) of four different types of eggs (e.g., large_half_dozen, large_dozen, etc) But really, wouldn’t it possibly make more sense to consider the case as a year-month-type combination, with a single price value for each case?
To do this (and make our data easier to graph and analyze), we can pivot longer - changing our data from 120 rows with 6 variables (2 grouping and 4 values) to 480 rows of 4 variables (with 3 grouping variables and a single price value).
eggs%>%
pivot_longer(cols=large_half_dozen:extra_large_dozen,
names_to = "eggType",
values_to = "avgPrice"
)
# A tibble: 480 × 4
month year eggType avgPrice
<chr> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 February 2004 large_half_dozen 128.
6 February 2004 large_dozen 226.
7 February 2004 extra_large_half_dozen 134.
8 February 2004 extra_large_dozen 230
9 March 2004 large_half_dozen 131
10 March 2004 large_dozen 225
# … with 470 more rows
Well, that was super easy. But wait, what if you are interested in egg size - you want to know how much more expensive extra-large eggs are compared to large eggs. Right now, that will be annoying, as you will have to keep sorting out the egg quantity - whether the price is for a half_dozen or a dozen eggs. Wouldn’t it be nice if we didn’t have a long egg type column with both size and quantity squashed into a single categorical variable? It would be so useful to have a new dataset with 4 grouping variables (year, month, size, and quantity) and the same value (price).
So, once again we want to use pivot longer, but we will be adding two new category variables (for a total of 4) and this will cut the number of rows in half (to 240). But how in the world can we let R know what we want it to do?? Thankfully, someone named the egg types (column-names) pretty systematically, but how can use this to our advantage? Working with patterns in the names_sep option of the pivot functions makes it pretty easy (well, except our variable names have more than one underscore, so we have to sort of hack this part by also using mutate on the resulting category labels.)
eggs%>%
pivot_longer(cols=large_half_dozen:extra_large_dozen,
names_to = c("size", "quantity"),
names_sep="arge_",
values_to = "price"
) %>%
mutate(size = case_when(
size == "l" ~ "Large",
size == "extra_l" ~ "Extra Large"
))
# A tibble: 480 × 5
month year size quantity price
<chr> <dbl> <chr> <chr> <dbl>
1 January 2004 Large half_dozen 126
2 January 2004 Large dozen 230
3 January 2004 Extra Large half_dozen 132
4 January 2004 Extra Large dozen 230
5 February 2004 Large half_dozen 128.
6 February 2004 Large dozen 226.
7 February 2004 Extra Large half_dozen 134.
8 February 2004 Extra Large dozen 230
9 March 2004 Large half_dozen 131
10 March 2004 Large dozen 225
# … with 470 more rows
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Rolfe (2021, Aug. 17). DACSS 601 August 2021: Example Code for Pivot Longer. Retrieved from https://mrolfe.github.io/DACSS601August2021/posts/2021-08-17-example-code-for-pivot-longer/
BibTeX citation
@misc{rolfe2021example, author = {Rolfe, Meredith}, title = {DACSS 601 August 2021: Example Code for Pivot Longer}, url = {https://mrolfe.github.io/DACSS601August2021/posts/2021-08-17-example-code-for-pivot-longer/}, year = {2021} }