Homework 5
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyverse)
library(readxl)
organic <- read_excel(
path = "../Downloads/organiceggpoultry.xlsx",
skip = 4
)
# remove line breaks in column names
names(organic) <- gsub("\n", " ", names(organic))
# abbreviate dozen
names(organic) <- gsub("Dozen", "Doz.", names(organic))
# name column to make it mutable
colnames(organic)[1] = "Date"
# remove typo from column name
colnames(organic)[3] = "Extra Large 1/2 Doz."
organic <- organic %>%
# drop empty column
select(-6) %>%
# remove extra characters from Date
mutate(Date = ifelse(grepl("/", Date), gsub(".{3}$", "", Date), Date)) %>%
# separate Date into month and year variables
separate(Date, sep = " ", into = c("Month", "Year")) %>%
# fix abbreviated month name
mutate(Month = str_replace(Month, "Jan", "January")) %>%
# remove double space
replace(" ", " ") %>%
# fill in missing year values
fill(Year) %>%
# cast mutated variables as numeric values
# replace strings with na
mutate(Year = as.numeric(Year)) %>%
mutate(across(3:11, as.numeric)) %>%
# pivot columns into unique rows
pivot_longer(3:11) %>%
# rename default column names
rename(Product = name) %>%
rename(Price = value) %>%
# add category variable
mutate(
Category = if_else(
grepl("Doz", Product), "Eggs","Chicken"
)
)
head(organic)
# A tibble: 6 x 6
Month Year ` ` Product Price Category
<chr> <dbl> <chr> <chr> <dbl> <chr>
1 January 2004 " " Extra Large Doz. 230 Eggs
2 January 2004 " " Extra Large 1/2 Doz. 132 Eggs
3 January 2004 " " Large Doz. 230 Eggs
4 January 2004 " " Large 1/2 Doz. 126 Eggs
5 January 2004 " " Whole 198. Chicken
6 January 2004 " " B/S Breast 646. Chicken
product_stats <- organic %>%
group_by(Category, Year) %>%
summarize(
`Mean Price` = mean(Price, na.rm = TRUE),
`SD` = sd(Price, na.rm = TRUE),
N = n(),
`Error` = sd(Price/sqrt(n()))
)
ggplot(product_stats, aes(Year, `Mean Price`, color = Category)) +
labs(
title = "Egg and Chicken prices may have risen over time",
subtitle = "Covering the years 2004 to 2013",
x = "",
y = "Avg Price"
) +
geom_smooth() +
geom_errorbar(aes(ymin = `Mean Price` - `SD`, ymax = `Mean Price` + `SD`), width = 0.2) +
theme_minimal() +
facet_wrap(~ Category)
This question attempts to find how have the price of chicken and egg products changed over time. Based on a reading of this data, the average price of both eggs and chicken may have risen over time, but by different intervals. It also appears that average price of eggs rose significantly from 2008 to 2009. Intuitively, the large error bars lead me to want to investigate the price of chicken. I should also add a label to denote the error bars as such for naive readers.
#
year_diff <- organic %>%
group_by(Year, Category) %>%
summarize(type = c("min", "max"), value = range(Price), .groups = "drop") %>%
group_by(Year, Category) %>%
summarize("Difference" = diff(range(value))) %>%
drop_na()
ggplot(year_diff, aes(x = `Year`, y = `Difference`, fill = `Category`)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Price ranges differ significantly between chicken and eggs each year",
subtitle = "Covering the years 2004 to 2013",
x = "",
y = "Price Range"
) +
theme_minimal()
This question attempts to find how the price ranges (difference between max and min price) for chicken and eggs compare from year to year. Based on a reading of the data, the price range of chicken is significantly higher than that of egg products. We also see that the price range peaked for eggs in 2008, whereas it peaked for chicken in 2012 and 2013. The missing value for chicken in 2004 leads me to want to recheck my wrangling.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Wheeler (2022, Jan. 14). Data Analytics and Computational Social Science: HW5. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheelhw5/
BibTeX citation
@misc{wheeler2022hw5, author = {Wheeler, Adam}, title = {Data Analytics and Computational Social Science: HW5}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheelhw5/}, year = {2022} }