April Merleaux
For HW 2 I analyzed poultry data, which were already tidy. I named the libraries in the setup chunk. Now I’m going to read in the csv and rename to a shorter name and view output and column names.

poultry_tidy <- read_csv("C:/Users/am37/Documents/DATA/class work/poultry_tidy - poultry_tidy.csv")
[1] "Product"      "Year"         "Month"        "Price_Dollar"

Describe the data

poultry_tidy has four columns:
Product and Month are character vectors
Year and Price_Dollar are doubles, or real numbers. The data includes the price in dollars by month and year for 5 different chicken cuts.

Describe the data wrangling

I grouped by year and summarised to show the mean price in dollars for all types of chicken. omitted the NAs.

group_by(poultry_tidy, Year) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 10 x 2
    Year `mean(Price_Dollar, na.rm = TRUE)`
   <dbl>                              <dbl>
 1  2004                               3.25
 2  2005                               3.36
 3  2006                               3.36
 4  2007                               3.36
 5  2008                               3.40
 6  2009                               3.42
 7  2010                               3.39
 8  2011                               3.36
 9  2012                               3.49
10  2013                               3.50

Next, I groupby Year and Product and summarised by Price_Dollar to see how the price varies across Products within a single year

group_by(poultry_tidy, Year, Product) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 50 x 3
# Groups:   Year [10]
    Year Product        `mean(Price_Dollar, na.rm = TRUE)`
   <dbl> <chr>                                       <dbl>
 1  2004 B/S Breast                                   6.43
 2  2004 Bone-in Breast                               3.90
 3  2004 Thighs                                       2.01
 4  2004 Whole                                        2.12
 5  2004 Whole Legs                                   1.99
 6  2005 B/S Breast                                   6.45
 7  2005 Bone-in Breast                               3.90
 8  2005 Thighs                                       2.21
 9  2005 Whole                                        2.17
10  2005 Whole Legs                                   2.04
# ... with 40 more rows

I’m curious how Whole chicken prices change between 2004 and 2013. I grouped by Year, filtered for Whole, and summarised for the mean price. I omitted the NAs.

group_by(poultry_tidy, Year) %>%
filter(Product == "Whole") %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 10 x 2
    Year `mean(Price_Dollar, na.rm = TRUE)`
   <dbl>                              <dbl>
 1  2004                               2.12
 2  2005                               2.17
 3  2006                               2.20
 4  2007                               2.20
 5  2008                               2.37
 6  2009                               2.48
 7  2010                               2.39
 8  2011                               2.35
 9  2012                               2.38
10  2013                               2.38

Now I wanted to know how price varies across months, so I’m going to group by month and summarise by mean price

group_by(poultry_tidy, Month) %>%
summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 12 x 2
   Month     `mean(Price_Dollar, na.rm = TRUE)`
   <chr>                                  <dbl>
 1 April                                   3.38
 2 August                                  3.40
 3 December                                3.40
 4 February                                3.38
 5 January                                 3.39
 6 July                                    3.40
 7 June                                    3.39
 8 March                                   3.38
 9 May                                     3.38
10 November                                3.40
11 October                                 3.40
12 September                               3.40

I wonder which cut of chicken is cheapest?

group_by(poultry_tidy, Product) %>%
  summarise(mean(Price_Dollar, na.rm = TRUE))
# A tibble: 5 x 2
  Product        `mean(Price_Dollar, na.rm = TRUE)`
  <chr>                                       <dbl>
1 B/S Breast                                   6.55
2 Bone-in Breast                               3.90
3 Thighs                                       2.18
4 Whole                                        2.31
5 Whole Legs                                   2.03

It looks like Whole Legs are the cheapest. That’s not surprising. I am a little surprised at how much cheaper they are than the most expensive cut.

#groupby month, filter to just look at Whole Legs, and pivot wider to look at the months against the years
group_by(poultry_tidy, Month) %>%
filter(Product == "Whole Legs") %>%
pivot_wider(names_from = Year, values_from = Price_Dollar)
# A tibble: 12 x 12
# Groups:   Month [12]
   Product    Month   `2013` `2012` `2011` `2010` `2009` `2008` `2007`
   <chr>      <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Whole Legs January   2.04   2.04   2.04   2.04   2.04   2.04   2.04
 2 Whole Legs Februa~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
 3 Whole Legs March     2.04   2.04   2.04   2.04   2.04   2.04   2.04
 4 Whole Legs April     2.04   2.04   2.04   2.04   2.04   2.04   2.04
 5 Whole Legs May       2.04   2.04   2.04   2.04   2.04   2.04   2.04
 6 Whole Legs June      2.04   2.04   2.04   2.04   2.04   2.04   2.04
 7 Whole Legs July      2.04   2.04   2.04   2.04   2.04   2.04   2.04
 8 Whole Legs August    2.04   2.04   2.04   2.04   2.04   2.04   2.04
 9 Whole Legs Septem~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
10 Whole Legs October   2.04   2.04   2.04   2.04   2.04   2.04   2.04
11 Whole Legs Novemb~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
12 Whole Legs Decemb~   2.04   2.04   2.04   2.04   2.04   2.04   2.04
# ... with 3 more variables: 2006 <dbl>, 2005 <dbl>, 2004 <dbl>

It is very strange to me that there is so little variation in the price of Whole Legs. It makes me wonder if there’s something funky with the data.


