challenge_4
Daniel Hannon
poultry_tidy
Author

Daniel Hannon

Published

March 29, 2023

Code
library(tidyverse)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE)

Read in the Data

Code
poultry_data <-readxl::read_excel("_data/poultry_tidy.xlsx")

poultry_data
# A tibble: 600 × 4
   Product  Year Month     Price_Dollar
   <chr>   <dbl> <chr>            <dbl>
 1 Whole    2013 January           2.38
 2 Whole    2013 February          2.38
 3 Whole    2013 March             2.38
 4 Whole    2013 April             2.38
 5 Whole    2013 May               2.38
 6 Whole    2013 June              2.38
 7 Whole    2013 July              2.38
 8 Whole    2013 August            2.38
 9 Whole    2013 September         2.38
10 Whole    2013 October           2.38
# … with 590 more rows
Code
summarytools::dfSummary(poultry_data)
Data Frame Summary  
poultry_data  
Dimensions: 600 x 4  
Duplicates: 0  

--------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values             Freqs (% of Valid)   Graph                 Valid      Missing  
---- -------------- -------------------------- -------------------- --------------------- ---------- ---------
1    Product        1. B/S Breast              120 (20.0%)          IIII                  600        0        
     [character]    2. Bone-in Breast          120 (20.0%)          IIII                  (100.0%)   (0.0%)   
                    3. Thighs                  120 (20.0%)          IIII                                      
                    4. Whole                   120 (20.0%)          IIII                                      
                    5. Whole Legs              120 (20.0%)          IIII                                      

2    Year           Mean (sd) : 2008.5 (2.9)   2004 : 60 (10.0%)    II                    600        0        
     [numeric]      min < med < max:           2005 : 60 (10.0%)    II                    (100.0%)   (0.0%)   
                    2004 < 2008.5 < 2013       2006 : 60 (10.0%)    II                                        
                    IQR (CV) : 5 (0)           2007 : 60 (10.0%)    II                                        
                                               2008 : 60 (10.0%)    II                                        
                                               2009 : 60 (10.0%)    II                                        
                                               2010 : 60 (10.0%)    II                                        
                                               2011 : 60 (10.0%)    II                                        
                                               2012 : 60 (10.0%)    II                                        
                                               2013 : 60 (10.0%)    II                                        

3    Month          1. April                    50 ( 8.3%)          I                     600        0        
     [character]    2. August                   50 ( 8.3%)          I                     (100.0%)   (0.0%)   
                    3. December                 50 ( 8.3%)          I                                         
                    4. February                 50 ( 8.3%)          I                                         
                    5. January                  50 ( 8.3%)          I                                         
                    6. July                     50 ( 8.3%)          I                                         
                    7. June                     50 ( 8.3%)          I                                         
                    8. March                    50 ( 8.3%)          I                                         
                    9. May                      50 ( 8.3%)          I                                         
                    10. November                50 ( 8.3%)          I                                         
                    [ 2 others ]               100 (16.7%)          III                                       

4    Price_Dollar   Mean (sd) : 3.4 (1.7)      32 distinct values   :                     593        7        
     [numeric]      min < med < max:                                :                     (98.8%)    (1.2%)   
                    1.9 < 2.4 < 7                                   :                                         
                    IQR (CV) : 1.8 (0.5)                            :     .         .                         
                                                                    : .   :         : .                       
--------------------------------------------------------------------------------------------------------------
Code
missing_data <- filter(poultry_data, is.na(Price_Dollar))

missing_data
# A tibble: 7 × 4
  Product         Year Month    Price_Dollar
  <chr>          <dbl> <chr>           <dbl>
1 Bone-in Breast  2004 January            NA
2 Bone-in Breast  2004 February           NA
3 Bone-in Breast  2004 March              NA
4 Bone-in Breast  2004 April              NA
5 Bone-in Breast  2004 May                NA
6 Bone-in Breast  2004 June               NA
7 Thighs          2004 January            NA

Describe the Data

This data set describes the cost of 5 Various poultry cuts, (Boneless Skinless Breast, Bone-in Breast, Thighs, Whole Legs, and Whole), each month from January 2004 to December 2013. The data is missing several prices from 2004: Thighs from January, and Bone-in Breast from January to June.

Tidy the Data

The Data is already in a Tidy format where each row is a singular observation of a price of a certain cut of meat from a specific month and year.

Mutate the Date

Right now the data has separate months and year columns so we need to add a date column so that we can sort things chronologically.

Code
poultry_data <- poultry_data %>%
        mutate(Date = ym(paste(Year, Month)))

head(poultry_data)
# A tibble: 6 × 5
  Product  Year Month    Price_Dollar Date      
  <chr>   <dbl> <chr>           <dbl> <date>    
1 Whole    2013 January          2.38 2013-01-01
2 Whole    2013 February         2.38 2013-02-01
3 Whole    2013 March            2.38 2013-03-01
4 Whole    2013 April            2.38 2013-04-01
5 Whole    2013 May              2.38 2013-05-01
6 Whole    2013 June             2.38 2013-06-01

Now we have a column set up with dates, although the dates all have the day set to the first. We don’t know the actual day that data was collected, but becasuse it is consistent throught the data_set, it wont mess up the ordering of anything.