DACSS 601 HW4

Data Visualization

Snehal Prabhu
2/20/2022

Reading the Data

eggs <- read_excel("~/eggs_tidy.xlsx")
head(eggs)
# A tibble: 6 x 6
  month     year large_half_dozen large_dozen extra_large_half_dozen
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>
1 January   2004             126         230                    132 
2 February  2004             128.        226.                   134.
3 March     2004             131         225                    137 
4 April     2004             131         225                    137 
5 May       2004             131         225                    137 
6 June      2004             134.        231.                   137 
# ... with 1 more variable: extra_large_dozen <dbl>
summary(eggs)
    month                year      large_half_dozen  large_dozen   
 Length:120         Min.   :2004   Min.   :126.0    Min.   :225.0  
 Class :character   1st Qu.:2006   1st Qu.:129.4    1st Qu.:233.5  
 Mode  :character   Median :2008   Median :174.5    Median :267.5  
                    Mean   :2008   Mean   :155.2    Mean   :254.2  
                    3rd Qu.:2011   3rd Qu.:174.5    3rd Qu.:268.0  
                    Max.   :2013   Max.   :178.0    Max.   :277.5  
 extra_large_half_dozen extra_large_dozen
 Min.   :132.0          Min.   :230.0    
 1st Qu.:135.8          1st Qu.:241.5    
 Median :185.5          Median :285.5    
 Mean   :164.2          Mean   :266.8    
 3rd Qu.:185.5          3rd Qu.:285.5    
 Max.   :188.1          Max.   :290.0    

Let’s observe the change in price of eggs in the year 2004 alone

eggs_2004 <- eggs %>%
  filter(year=='2004')
eggs_2004 <- pivot_longer(eggs_2004, "large_half_dozen":"extra_large_dozen", names_to = "type", values_to = "cost")

eggs_2004
# A tibble: 48 x 4
   month     year type                    cost
   <chr>    <dbl> <chr>                  <dbl>
 1 January   2004 large_half_dozen        126 
 2 January   2004 large_dozen             230 
 3 January   2004 extra_large_half_dozen  132 
 4 January   2004 extra_large_dozen       230 
 5 February  2004 large_half_dozen        128.
 6 February  2004 large_dozen             226.
 7 February  2004 extra_large_half_dozen  134.
 8 February  2004 extra_large_dozen       230 
 9 March     2004 large_half_dozen        131 
10 March     2004 large_dozen             225 
# ... with 38 more rows

Plot the prices of eggs in the year 2004 in a side-by-side bar chart

library(ggplot2)
ggplot(data = eggs_2004, aes(x=month, y=cost, fill=type)) +
  geom_bar(stat = "identity", position = 'dodge')

To see the average change in the prices of all egg categories through the years in the eggs data set calculate the mean of the prices of a year using the group by and summarize_at function

library(dplyr)
eggs_year <- eggs %>%
  group_by(year) %>%
  summarize_at(c("large_half_dozen", "large_dozen", "extra_large_half_dozen", "extra_large_dozen"), mean)
head(eggs_year)
# A tibble: 6 x 5
   year large_half_dozen large_dozen extra_large_hal~ extra_large_doz~
  <dbl>            <dbl>       <dbl>            <dbl>            <dbl>
1  2004             130.        230.             136.             237.
2  2005             128.        234.             136.             241 
3  2006             128.        234.             136.             241.
4  2007             132.        237.             139.             245.
5  2008             157.        261.             166.             269.
6  2009             174.        275              186.             286.

Lets visualize these changes with a bar chart

eggs_year_dozen <- pivot_longer(eggs_year, c("large_dozen","extra_large_dozen"), names_to = "type", values_to="cost")
head(eggs_year_dozen)
# A tibble: 6 x 5
   year large_half_dozen extra_large_half_dozen type              cost
  <dbl>            <dbl>                  <dbl> <chr>            <dbl>
1  2004             130.                   136. large_dozen       230.
2  2004             130.                   136. extra_large_doz~  237.
3  2005             128.                   136. large_dozen       234.
4  2005             128.                   136. extra_large_doz~  241 
5  2006             128.                   136. large_dozen       234.
6  2006             128.                   136. extra_large_doz~  241.
ggplot(data = eggs_year_dozen, aes(x=year, y=cost, fill=type)) +
  geom_bar(stat = "identity", position = 'dodge')+
  theme_bw()

eggs_year_halfdozen <- pivot_longer(eggs_year, c("large_half_dozen","extra_large_half_dozen"), names_to = "type", values_to="cost")
head(eggs_year_halfdozen)
# A tibble: 6 x 5
   year large_dozen extra_large_dozen type                    cost
  <dbl>       <dbl>             <dbl> <chr>                  <dbl>
1  2004        230.              237. large_half_dozen        130.
2  2004        230.              237. extra_large_half_dozen  136.
3  2005        234.              241  large_half_dozen        128.
4  2005        234.              241  extra_large_half_dozen  136.
5  2006        234.              241. large_half_dozen        128.
6  2006        234.              241. extra_large_half_dozen  136.
ggplot(data = eggs_year_halfdozen, aes(x=year, y=cost, fill=type)) +
  geom_bar(stat = "identity", position = 'dodge')

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Prabhu (2022, Feb. 23). Data Analytics and Computational Social Science: DACSS 601 HW4. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw4/

BibTeX citation

@misc{prabhu2022dacss,
  author = {Prabhu, Snehal},
  title = {Data Analytics and Computational Social Science: DACSS 601 HW4},
  url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomsnehalhw4/},
  year = {2022}
}