Challenge 7

challenge_7
eggs
Visualizing Multiple Dimensions
Author

Paarth Tandon

Published

January 16, 2023

library(tidyverse)
library(ggplot2)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

set.seed(42)
eggs_raw <- read_csv('_data/eggs_tidy.csv')
head(eggs_raw)
month year large_half_dozen large_dozen extra_large_half_dozen extra_large_dozen
January 2004 126.0 230.000 132.0 230.0
February 2004 128.5 226.250 134.5 230.0
March 2004 131.0 225.000 137.0 230.0
April 2004 131.0 225.000 137.0 234.5
May 2004 131.0 225.000 137.0 236.0
June 2004 133.5 231.375 137.0 241.0

Briefly describe the data

This dataset describes the average price for a carton of eggs given the month and year. There are four types of cartons: large half dozen, large dozen, extra large half dozen, extra large dozen.

Tidy Data (as needed)

I’m going to pivot this dataset to make visualization easier. Then, we have to convert the type of egg carton into two columns, size and whether it is half or full dozen. This will allow us to plot each attribute separately. We can do this by removing the ‘’ between extra_large and half_dozen. This way, there is only one ’’ in the name, between the size and the quantity. Then, we can separate them into two columns. Then we can convert the date into a proper date using lubridate.

eggs <- eggs_raw %>%
    pivot_longer(cols = c(`large_half_dozen`, `large_dozen`, `extra_large_half_dozen`, `extra_large_dozen`), values_to = "Price ($)") %>%
    mutate(name = str_replace(name, "extra_large", "Extra Large"), name = str_replace(name, "half_dozen", "Half Dozen"), name = str_replace(name, "dozen", "Dozen"), name = str_replace(name, "large", "Large")) %>%
    separate(name, into = c("Size", "Quantity"), sep = "_") %>%
    mutate(Date = ym(paste(`year`, `month`, sep = " ")))
head(eggs)
month year Size Quantity Price ($) Date
January 2004 Large Half Dozen 126.00 2004-01-01
January 2004 Large Dozen 230.00 2004-01-01
January 2004 Extra Large Half Dozen 132.00 2004-01-01
January 2004 Extra Large Dozen 230.00 2004-01-01
February 2004 Large Half Dozen 128.50 2004-02-01
February 2004 Large Dozen 226.25 2004-02-01

Visualization with Multiple Dimensions

ggplot(eggs, aes(Date, `Price ($)`, col = Size)) +
    ggtitle('Price of Half/Full Dozen Cartons of Eggs') +
    geom_line() +
    facet_wrap(vars(Quantity), scales = 'free_y')

I chose this graph type because I wanted to compare the price of extra large and large eggs over time. Since prices are different based on carton size, I split the graph on that metric. As we can see, prices when sold by the dozen decreased for large eggs, while they increased when sold by the half dozen.

ggplot(eggs, aes(Date, `Price ($)`, fill = Quantity)) +
    ggtitle('Price of Large/Extra Large Cartons of Eggs') +
    geom_bar(position = "stack", stat = "identity") +
    facet_wrap(vars(Size))

In this graph, I switched the group variable to the size of the eggs. Then, I stacked the price of both quantities. This allows us to see the overall change in price of the eggs dependent on the size of the eggs. From this graph we can see that large eggs are always cheaper than extra large eggs. We can also see that both sizes of eggs follow a similar trend over time.