HW2 Merleaux

Reading data in to R studio.

April Merleaux
2022-02-10
knitr::opts_chunk$set(echo = TRUE)
library(readr)
library(tidyverse)
library(dplyr)

For HW 2 I analyzed poultry data, which were already tidy. I named the libraries in the setup chunk. Now I’m going to read in the csv and rename to a shorter name and view output and column names.

poultry_tidy <- read_csv("C:/Users/am37/Documents/DATA/class work/poultry_tidy - poultry_tidy.csv")
view(poultry_tidy)
colnames(poultry_tidy)
[1] "Product"      "Year"         "Month"        "Price_Dollar"

Describe the data

poultry_tidy has four columns:
Product and Month are character vectors
Year and Price_Dollar are doubles, or real numbers. The data includes the price in dollars by month and year for 5 different chicken cuts.

Describe the data wrangling

I grouped by year and summarised to show the mean price in dollars for all types of chicken. omitted the NAs.

group_by(poultry_tidy, Year) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 10 x 2
    Year `mean(Price_Dollar, na.rm = TRUE)`
   <dbl>                              <dbl>
 1  2004                               3.25
 2  2005                               3.36
 3  2006                               3.36
 4  2007                               3.36
 5  2008                               3.40
 6  2009                               3.42
 7  2010                               3.39
 8  2011                               3.36
 9  2012                               3.49
10  2013                               3.50

Next, I groupby Year and Product and summarised by Price_Dollar to see how the price varies across Products within a single year

group_by(poultry_tidy, Year, Product) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 50 x 3
# Groups:   Year [10]
    Year Product        `mean(Price_Dollar, na.rm = TRUE)`
   <dbl> <chr>                                       <dbl>
 1  2004 B/S Breast                                   6.43
 2  2004 Bone-in Breast                               3.90
 3  2004 Thighs                                       2.01
 4  2004 Whole                                        2.12
 5  2004 Whole Legs                                   1.99
 6  2005 B/S Breast                                   6.45
 7  2005 Bone-in Breast                               3.90
 8  2005 Thighs                                       2.21
 9  2005 Whole                                        2.17
10  2005 Whole Legs                                   2.04
# ... with 40 more rows

I’m curious how Whole chicken prices change between 2004 and 2013. I grouped by Year, filtered for Whole, and summarised for the mean price. I omitted the NAs.

group_by(poultry_tidy, Year) %>%
filter(Product == "Whole") %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 10 x 2
    Year `mean(Price_Dollar, na.rm = TRUE)`
   <dbl>                              <dbl>
 1  2004                               2.12
 2  2005                               2.17
 3  2006                               2.20
 4  2007                               2.20
 5  2008                               2.37
 6  2009                               2.48
 7  2010                               2.39
 8  2011                               2.35
 9  2012                               2.38
10  2013                               2.38

Now I wanted to know how price varies across months, so I’m going to group by month and summarise by mean price

group_by(poultry_tidy, Month) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 12 x 2
   Month     `mean(Price_Dollar, na.rm = TRUE)`
   <chr>                                  <dbl>
 1 April                                   3.38
 2 August                                  3.40
 3 December                                3.40
 4 February                                3.38
 5 January                                 3.39
 6 July                                    3.40
 7 June                                    3.39
 8 March                                   3.38
 9 May                                     3.38
10 November                                3.40
11 October                                 3.40
12 September                               3.40

I wonder which cut of chicken is cheapest?

group_by(poultry_tidy, Product) %>%
  summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 5 x 2
  Product        `mean(Price_Dollar, na.rm = TRUE)`
  <chr>                                       <dbl>
1 B/S Breast                                   6.55
2 Bone-in Breast                               3.90
3 Thighs                                       2.18
4 Whole                                        2.31
5 Whole Legs                                   2.03

It looks like Whole Legs are the cheapest. That’s not surprising. I am a little surprised at how much cheaper they are than the most expensive cut.

#groupby month, filter to just look at Whole Legs, and pivot wider to look at the months against the years
group_by(poultry_tidy, Month) %>%
filter(Product == "Whole Legs") %>%
pivot_wider(names_from = Year, values_from = Price_Dollar)
# A tibble: 12 x 12
# Groups:   Month [12]
   Product    Month   `2013` `2012` `2011` `2010` `2009` `2008` `2007`
   <chr>      <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Whole Legs January   2.04   2.04   2.04   2.04   2.04   2.04   2.04
 2 Whole Legs Februa~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
 3 Whole Legs March     2.04   2.04   2.04   2.04   2.04   2.04   2.04
 4 Whole Legs April     2.04   2.04   2.04   2.04   2.04   2.04   2.04
 5 Whole Legs May       2.04   2.04   2.04   2.04   2.04   2.04   2.04
 6 Whole Legs June      2.04   2.04   2.04   2.04   2.04   2.04   2.04
 7 Whole Legs July      2.04   2.04   2.04   2.04   2.04   2.04   2.04
 8 Whole Legs August    2.04   2.04   2.04   2.04   2.04   2.04   2.04
 9 Whole Legs Septem~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
10 Whole Legs October   2.04   2.04   2.04   2.04   2.04   2.04   2.04
11 Whole Legs Novemb~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
12 Whole Legs Decemb~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
# ... with 3 more variables: 2006 <dbl>, 2005 <dbl>, 2004 <dbl>

It is very strange to me that there is so little variation in the price of Whole Legs. It makes me wonder if there’s something funky with the data.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Merleaux (2022, Feb. 13). Data Analytics and Computational Social Science: HW2 Merleaux. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httprpubscomamerleaux864543/

BibTeX citation

@misc{merleaux2022hw2,
  author = {Merleaux, April},
  title = {Data Analytics and Computational Social Science: HW2 Merleaux},
  url = {https://github.com/DACSS/dacss_course_website/posts/httprpubscomamerleaux864543/},
  year = {2022}
}