HW5

Homework 5

Adam Wheeler
2022-01-17

Research questions

  1. How has the price of chicken and egg products changed over time?
  2. How do the yearly price ranges compare between product categories?

Read the data set

knitr::opts_chunk$set(echo = TRUE)

library(dplyr)
library(tidyverse)
library(readxl)

organic <- read_excel(
  path = "../Downloads/organiceggpoultry.xlsx",
  skip = 4
)

Tidy the data set

# remove line breaks in column names
names(organic) <- gsub("\n", " ", names(organic))
# abbreviate dozen
names(organic) <- gsub("Dozen", "Doz.", names(organic))
# name column to make it mutable
colnames(organic)[1] = "Date"
# remove typo from column name
colnames(organic)[3] = "Extra Large 1/2 Doz."

organic <- organic %>%
  # drop empty column
  select(-6) %>%
  # remove extra characters from Date
  mutate(Date = ifelse(grepl("/", Date), gsub(".{3}$", "", Date), Date)) %>%
  # separate Date into month and year variables
  separate(Date, sep = " ", into = c("Month", "Year")) %>%
  # fix abbreviated month name
  mutate(Month = str_replace(Month, "Jan", "January")) %>%
  # remove double space
  replace("  ", " ") %>%
  # fill in missing year values
  fill(Year) %>%
  # cast mutated variables as numeric values
  # replace strings with na
  mutate(Year = as.numeric(Year)) %>%
  mutate(across(3:11, as.numeric)) %>%
  # pivot columns into unique rows
  pivot_longer(3:11) %>%
  # rename default column names
  rename(Product = name) %>%
  rename(Price = value) %>%
  # add category variable
  mutate(
    Category = if_else(
      grepl("Doz", Product), "Eggs","Chicken"
    )
  )

head(organic)
# A tibble: 6 x 6
  Month    Year `  `  Product              Price Category
  <chr>   <dbl> <chr> <chr>                <dbl> <chr>   
1 January  2004 " "   Extra Large  Doz.     230  Eggs    
2 January  2004 " "   Extra Large 1/2 Doz.  132  Eggs    
3 January  2004 " "   Large  Doz.           230  Eggs    
4 January  2004 " "   Large  1/2 Doz.       126  Eggs    
5 January  2004 " "   Whole                 198. Chicken 
6 January  2004 " "   B/S Breast            646. Chicken 

1. How has the price of chicken and egg products changed over time?

product_stats <- organic %>%
  group_by(Category, Year) %>%
  summarize(
    `Mean Price` = mean(Price, na.rm = TRUE),
    `SD` = sd(Price, na.rm = TRUE),
    N = n(),
    `Error` = sd(Price/sqrt(n()))
  )

ggplot(product_stats, aes(Year, `Mean Price`, color = Category)) +
  labs(
    title = "Egg and Chicken prices may have risen over time",
    subtitle = "Covering the years 2004 to 2013",
    x = "",
    y = "Avg Price"
  ) +
  geom_smooth() +
  geom_errorbar(aes(ymin = `Mean Price` - `SD`, ymax = `Mean Price` + `SD`), width = 0.2) +
  theme_minimal() +
  facet_wrap(~ Category)

This question attempts to find how have the price of chicken and egg products changed over time. Based on a reading of this data, the average price of both eggs and chicken may have risen over time, but by different intervals. It also appears that average price of eggs rose significantly from 2008 to 2009. Intuitively, the large error bars lead me to want to investigate the price of chicken. I should also add a label to denote the error bars as such for naive readers.

2. How do the yearly price ranges compare between product categories?

# 
year_diff <- organic %>%
  group_by(Year, Category) %>%
  summarize(type = c("min", "max"), value = range(Price), .groups = "drop") %>%
  group_by(Year, Category) %>%
  summarize("Difference" = diff(range(value))) %>%
  drop_na()

ggplot(year_diff, aes(x = `Year`, y = `Difference`, fill = `Category`)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Price ranges differ significantly between chicken and eggs each year",
    subtitle = "Covering the years 2004 to 2013",
    x = "",
    y = "Price Range"
  ) +
  theme_minimal()

This question attempts to find how the price ranges (difference between max and min price) for chicken and eggs compare from year to year. Based on a reading of the data, the price range of chicken is significantly higher than that of egg products. We also see that the price range peaked for eggs in 2008, whereas it peaked for chicken in 2012 and 2013. The missing value for chicken in 2004 leads me to want to recheck my wrangling.

From my research up to this point, I can conclude that the price of chicken and eggs rose slightly over the years, and that the price of chicken ranged more drastically than the price of eggs.

Having said that, for my final research project, I would like to home in on just statistical research question. (I would like to know if the month of year affects the price of chicken/eggs.) I will need to reorganize my report to follow a more logical order (introducing the question, stating assumptions, references, tidying the data, visualizing and interpreting results, etc.) My code blocks can be refactored and made more legible. While my labels and plot titles are helpful for the naive reader, I do not explain my variables or unitss of measurement - In my final project, I must devote more space and attention to these details.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Wheeler (2022, Jan. 20). Data Analytics and Computational Social Science: HW5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheel855110/

BibTeX citation

@misc{wheeler2022hw5,
  author = {Wheeler, Adam},
  title = {Data Analytics and Computational Social Science: HW5},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheel855110/},
  year = {2022}
}