Challenge 4

eggs

Author

Guanhua Tan

Published

October 1, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
tidy data (as needed, including sanity checks)
identify variables that need to be mutated
mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

abc_poll.csv ⭐
poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
FedFundsRate.csv⭐⭐⭐
hotel_bookings.csv⭐⭐⭐⭐
debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

Code

library(dplyr)
library(readxl) 
poultry<-read_excel("_data/poultry_tidy.xlsx")
poultry

# A tibble: 600 × 4
   Product  Year Month     Price_Dollar
   <chr>   <dbl> <chr>            <dbl>
 1 Whole    2013 January           2.38
 2 Whole    2013 February          2.38
 3 Whole    2013 March             2.38
 4 Whole    2013 April             2.38
 5 Whole    2013 May               2.38
 6 Whole    2013 June              2.38
 7 Whole    2013 July              2.38
 8 Whole    2013 August            2.38
 9 Whole    2013 September         2.38
10 Whole    2013 October           2.38
# … with 590 more rows

Briefly describe the data

The dataset records the prices of Chicken and its different parts from Jan. 2004 to Jan 2013. The column names includes Product, Year, Month, Price_Dollar. There are a few values (price) missing.

Tidy Data (as needed)

First, I want data have a clearer order so I move Year and Month column names to the first two columns. Additionally, I’d like to recreate column names as Chicken’s different parts. So I use pivot_wider function to rearrange the data.

Code

# tidy data, use Chicken's different parts as column names
df <-poultry %>% select(Year, Month, Product, Price_Dollar) %>% pivot_wider(names_from = Product, values_from = Price_Dollar)
df

# A tibble: 120 × 7
    Year Month     Whole `B/S Breast` `Bone-in Breast` `Whole Legs` Thighs
   <dbl> <chr>     <dbl>        <dbl>            <dbl>        <dbl>  <dbl>
 1  2013 January    2.38         7.04             3.90         2.04   2.16
 2  2013 February   2.38         7.04             3.90         2.04   2.16
 3  2013 March      2.38         7.04             3.90         2.04   2.16
 4  2013 April      2.38         7.04             3.90         2.04   2.16
 5  2013 May        2.38         7.04             3.90         2.04   2.16
 6  2013 June       2.38         7.04             3.90         2.04   2.16
 7  2013 July       2.38         7.04             3.90         2.04   2.16
 8  2013 August     2.38         7.04             3.90         2.04   2.16
 9  2013 September  2.38         7.04             3.90         2.04   2.16
10  2013 October    2.38         7.04             3.90         2.04   2.16
# … with 110 more rows

Identify variables that need to be mutated

I found I may mutate the average price of every part of Chicken per year.

I grouped the data by year and mutate 5 columns to calculate the mean of every part of Chiken every year.

Code

# mutate data
df %>% 
  group_by(Year) %>% 
  mutate(whole_ave = mean(Whole),
         breast_ave= mean(`B/S Breast`),
         bone_in_ave = mean (`Bone-in Breast`),
         whole_legs_ave = mean (`Whole Legs`),
         thighs_ave = mean(`Thighs`)
         )%>%
  select(Year, Month, Whole, whole_ave, `B/S Breast`, breast_ave, `Bone-in Breast`, bone_in_ave, `Whole Legs`, whole_legs_ave, Thighs, thighs_ave)

# A tibble: 120 × 12
# Groups:   Year [10]
    Year Month     Whole whole…¹ B/S B…² breas…³ Bone-…⁴ bone_…⁵ Whole…⁶ whole…⁷
   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  2013 January    2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 2  2013 February   2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 3  2013 March      2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 4  2013 April      2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 5  2013 May        2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 6  2013 June       2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 7  2013 July       2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 8  2013 August     2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 9  2013 September  2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
10  2013 October    2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
# … with 110 more rows, 2 more variables: Thighs <dbl>, thighs_ave <dbl>, and
#   abbreviated variable names ¹whole_ave, ²`B/S Breast`, ³breast_ave,
#   ⁴`Bone-in Breast`, ⁵bone_in_ave, ⁶`Whole Legs`, ⁷whole_legs_ave