Data Visualization
eggs <- read_excel("~/eggs_tidy.xlsx")
head(eggs)
# A tibble: 6 x 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
# ... with 1 more variable: extra_large_dozen <dbl>
summary(eggs)
month year large_half_dozen large_dozen
Length:120 Min. :2004 Min. :126.0 Min. :225.0
Class :character 1st Qu.:2006 1st Qu.:129.4 1st Qu.:233.5
Mode :character Median :2008 Median :174.5 Median :267.5
Mean :2008 Mean :155.2 Mean :254.2
3rd Qu.:2011 3rd Qu.:174.5 3rd Qu.:268.0
Max. :2013 Max. :178.0 Max. :277.5
extra_large_half_dozen extra_large_dozen
Min. :132.0 Min. :230.0
1st Qu.:135.8 1st Qu.:241.5
Median :185.5 Median :285.5
Mean :164.2 Mean :266.8
3rd Qu.:185.5 3rd Qu.:285.5
Max. :188.1 Max. :290.0
Let’s observe the change in price of eggs in the year 2004 alone
eggs_2004 <- eggs %>%
filter(year=='2004')
eggs_2004 <- pivot_longer(eggs_2004, "large_half_dozen":"extra_large_dozen", names_to = "type", values_to = "cost")
eggs_2004
# A tibble: 48 x 4
month year type cost
<chr> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 February 2004 large_half_dozen 128.
6 February 2004 large_dozen 226.
7 February 2004 extra_large_half_dozen 134.
8 February 2004 extra_large_dozen 230
9 March 2004 large_half_dozen 131
10 March 2004 large_dozen 225
# ... with 38 more rows
Plot the prices of eggs in the year 2004 in a side-by-side bar chart
library(ggplot2)
ggplot(data = eggs_2004, aes(x=month, y=cost, fill=type)) +
geom_bar(stat = "identity", position = 'dodge')
To see the average change in the prices of all egg categories through the years in the eggs data set calculate the mean of the prices of a year using the group by and summarize_at function
library(dplyr)
eggs_year <- eggs %>%
group_by(year) %>%
summarize_at(c("large_half_dozen", "large_dozen", "extra_large_half_dozen", "extra_large_dozen"), mean)
head(eggs_year)
# A tibble: 6 x 5
year large_half_dozen large_dozen extra_large_hal~ extra_large_doz~
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2004 130. 230. 136. 237.
2 2005 128. 234. 136. 241
3 2006 128. 234. 136. 241.
4 2007 132. 237. 139. 245.
5 2008 157. 261. 166. 269.
6 2009 174. 275 186. 286.
Lets visualize these changes with a bar chart
eggs_year_dozen <- pivot_longer(eggs_year, c("large_dozen","extra_large_dozen"), names_to = "type", values_to="cost")
head(eggs_year_dozen)
# A tibble: 6 x 5
year large_half_dozen extra_large_half_dozen type cost
<dbl> <dbl> <dbl> <chr> <dbl>
1 2004 130. 136. large_dozen 230.
2 2004 130. 136. extra_large_doz~ 237.
3 2005 128. 136. large_dozen 234.
4 2005 128. 136. extra_large_doz~ 241
5 2006 128. 136. large_dozen 234.
6 2006 128. 136. extra_large_doz~ 241.
ggplot(data = eggs_year_dozen, aes(x=year, y=cost, fill=type)) +
geom_bar(stat = "identity", position = 'dodge')+
theme_bw()
eggs_year_halfdozen <- pivot_longer(eggs_year, c("large_half_dozen","extra_large_half_dozen"), names_to = "type", values_to="cost")
head(eggs_year_halfdozen)
# A tibble: 6 x 5
year large_dozen extra_large_dozen type cost
<dbl> <dbl> <dbl> <chr> <dbl>
1 2004 230. 237. large_half_dozen 130.
2 2004 230. 237. extra_large_half_dozen 136.
3 2005 234. 241 large_half_dozen 128.
4 2005 234. 241 extra_large_half_dozen 136.
5 2006 234. 241. large_half_dozen 128.
6 2006 234. 241. extra_large_half_dozen 136.
ggplot(data = eggs_year_halfdozen, aes(x=year, y=cost, fill=type)) +
geom_bar(stat = "identity", position = 'dodge')
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Prabhu (2022, Feb. 23). Data Analytics and Computational Social Science: DACSS 601 HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw4/
BibTeX citation
@misc{prabhu2022dacss, author = {Prabhu, Snehal}, title = {Data Analytics and Computational Social Science: DACSS 601 HW4}, url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw4/}, year = {2022} }