Homework 4
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyverse)
library(readxl)
organic <- read_excel(
path = "../Downloads/organiceggpoultry.xlsx",
skip = 4
)
# remove line breaks in column names
names(organic) <- gsub("\n", " ", names(organic))
# name column to make it mutable
colnames(organic)[1] = "Date"
# remove typo from column name
colnames(organic)[3] = "Extra Large 1/2 Doz."
organic <- organic %>%
# drop empty column
select(-6) %>%
# remove extra characters from Date
mutate(Date = ifelse(grepl("/", Date), gsub(".{3}$", "", Date), Date)) %>%
# separate Date into month and year variables
separate(Date, sep = " ", into = c("Month", "Year")) %>%
# fix abbreviated month name
mutate(Month = str_replace(Month, "Jan", "January")) %>%
# fill in missing year values
fill(Year) %>%
# convert mutated variables as numeric values (and replace strings with na)
mutate(Year = as.numeric(Year)) %>%
mutate(across(3:11, as.numeric))
head(organic)
# A tibble: 6 x 11
Month Year `Extra Large Dozen` `Extra Large 1/2~ `Large Dozen`
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 230 132 230
2 February 2004 230 134. 226.
3 March 2004 230 137 225
4 April 2004 234. 137 225
5 May 2004 236 137 225
6 June 2004 241 137 231.
# ... with 6 more variables: Large 1/2 Doz. <dbl>, Whole <dbl>,
# B/S Breast <dbl>, Bone-in Breast <dbl>, Whole Legs <dbl>,
# Thighs <dbl>
all_stats <- organic %>%
summarize_each(funs(mean(., na.rm = TRUE), median(., na.rm = TRUE), sd(., na.rm = TRUE)), -c(Month, Year))
yearly_stats <- organic %>%
group_by(Year) %>%
select(-Month) %>%
summarize_all(funs(mean(., na.rm = TRUE))) %>%
pivot_longer(cols = c(-Year)) %>%
rename(Product = name) %>%
rename(`Mean Price` = value) %>%
mutate(
Category = if_else(
grepl("Doz", Product), "Eggs","Chicken"
)
)
monthly_stats <- organic %>%
group_by(Month) %>%
select(-Year) %>%
summarize_all(funs(mean(., na.rm = TRUE))) %>%
pivot_longer(cols = c(-Month)) %>%
rename(Product = name) %>%
rename(`Mean Price` = value) %>%
mutate(
Category = if_else(
grepl("Doz", Product), "Eggs","Chicken"
)
)
head(yearly_stats)
# A tibble: 6 x 4
Year Product `Mean Price` Category
<dbl> <chr> <dbl> <chr>
1 2004 Extra Large Dozen 237. Eggs
2 2004 Extra Large 1/2 Doz. 136. Eggs
3 2004 Large Dozen 230. Eggs
4 2004 Large 1/2 Doz. 130. Eggs
5 2004 Whole 212. Chicken
6 2004 B/S Breast 643. Chicken
ggplot(data = yearly_stats, mapping = aes(
x = `Year`,
y = `Mean Price`,
color = Category
)) +
ggtitle("Mean Product Price by Year") +
# geom_point() +
geom_smooth()
In this visualization, I am plotting the mean price of chicken and egg products over the years 2004 to 2013. I am plotting this data in response to the question: How has the price of chicken and eggs changed over time? From this visualization, I can conclude that the average price of both chicken and egg products have increased over time.
This visualization does not account for individual product prices. The values range is helpful but may clutter the plot or make it confusing for a naive viewer. For improvement, I could remove the plot background and simplify the design, in the hope of reducing clutter.
eggs <- filter(monthly_stats, Category == "Eggs")
chicken <- filter(monthly_stats, Category == "Chicken")
ggplot(data = eggs, mapping = aes(x = Month, y = `Mean Price`)) +
geom_boxplot() +
ggtitle("Monthly Egg Product Prices") +
coord_flip()
ggplot(data = chicken, mapping = aes(x = Month, y = `Mean Price`)) +
geom_boxplot() +
ggtitle("Monthly Chicken Product Prices") +
coord_flip()
In this visualization, I am plotting the mean price of chicken and egg products for each month of the year between 2004 and 2013. I am plotting this data in response to the question: How do product prices compare with each other per month? From this visualization, I can conclude that product prices were highest in July. I can also see large outlier values in the mean price of chicken products.
This visualization is admittedly cumbersome. Both egg and chicken products should appear either on one plot or else side by side. It is difficult to interpret the actual or relative value of the chicken products due to outliers. For improvement, I may reconsider the boxplot as the best tool to explore my research question.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Wheeler (2022, Jan. 11). Data Analytics and Computational Social Science: HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheelhw4/
BibTeX citation
@misc{wheeler2022hw4, author = {Wheeler, Adam}, title = {Data Analytics and Computational Social Science: HW4}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomabwheelhw4/}, year = {2022} }