Challenge 4 submission

challenge_4
eggs
poultry
More data wrangling: pivoting
Author

Cam Needels

Published

March 27, 2023

Code
library(tidyverse)
library(summarytools)
library(readr)
library(readxl)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
Code
#displaying the dataset
eggs_data<- read_csv("B:/Needels/Documents/DACCS 601/DACSS_601_New/posts/_data/poultry_tidy.csv")
eggs_data
# A tibble: 600 × 4
   Product  Year Month     Price_Dollar
   <chr>   <dbl> <chr>            <dbl>
 1 Whole    2013 January           2.38
 2 Whole    2013 February          2.38
 3 Whole    2013 March             2.38
 4 Whole    2013 April             2.38
 5 Whole    2013 May               2.38
 6 Whole    2013 June              2.38
 7 Whole    2013 July              2.38
 8 Whole    2013 August            2.38
 9 Whole    2013 September         2.38
10 Whole    2013 October           2.38
# … with 590 more rows
Code
#this was done in order to figure out the unique data points for product
categories <- unique(eggs_data$Product)
categories
[1] "Whole"          "B/S Breast"     "Bone-in Breast" "Whole Legs"    
[5] "Thighs"        
Code
#this was done in order to figure out how many years this data set ranged from.
yearcat <-- unique(eggs_data$Year)
yearcat
 [1] -2013 -2012 -2011 -2010 -2009 -2008 -2007 -2006 -2005 -2004

Briefly describe the data

It shows the year and month when different forms of chicken were purchased. Whether it’s Whole, B/S Breast, Bone-in Breast, Whole Legs, or Thighs and the amount of price per dollar.These ranges from the year 2004 to 2013.

Tidy Data (as needed)

The data is already tidy so I don’t need to make changes. However I do need to figure out the variables in the product category.

Identify variables that need to be mutated

Product and Month are not numeric or double so we have to convert them into numbers so we can analyze the data more in depth. We will do this by creating a date column by taking the year and month to make a date column. We will also recode the products so that they can be changed into numbers and be able to be used for data analysis. I will use dfsummary afterwards and here are the results.

Code
#convert Month -> numbers
eggs_mutate <- eggs_data %>%
  mutate(Month_num = recode(Month, "January" = 1, "February" = 2, "March" = 3, "April" = 4, "May" = 5, "June" = 6, "July" = 7, "August" = 8, "September" = 9, "October" = 10, "November" = 11, "December" = 12))

#assigning IDs to chicken types
eggs_mutate <- eggs_mutate %>%
  mutate(Chicken_ID = recode(Product, "B/S Breast" = 1, 
                             "Bone-in Breast" = 2, 
                             "Thighs" = 3, 
                             "Whole" = 4, 
                             "Whole Legs" = 5))
eggs_mutate
# A tibble: 600 × 6
   Product  Year Month     Price_Dollar Month_num Chicken_ID
   <chr>   <dbl> <chr>            <dbl>     <dbl>      <dbl>
 1 Whole    2013 January           2.38         1          4
 2 Whole    2013 February          2.38         2          4
 3 Whole    2013 March             2.38         3          4
 4 Whole    2013 April             2.38         4          4
 5 Whole    2013 May               2.38         5          4
 6 Whole    2013 June              2.38         6          4
 7 Whole    2013 July              2.38         7          4
 8 Whole    2013 August            2.38         8          4
 9 Whole    2013 September         2.38         9          4
10 Whole    2013 October           2.38        10          4
# … with 590 more rows
Code
dfSummary(eggs_mutate)
Data Frame Summary  
eggs_mutate  
Dimensions: 600 x 6  
Duplicates: 0  

--------------------------------------------------------------------------------------------------------------
No   Variable       Stats / Values             Freqs (% of Valid)   Graph                 Valid      Missing  
---- -------------- -------------------------- -------------------- --------------------- ---------- ---------
1    Product        1. B/S Breast              120 (20.0%)          IIII                  600        0        
     [character]    2. Bone-in Breast          120 (20.0%)          IIII                  (100.0%)   (0.0%)   
                    3. Thighs                  120 (20.0%)          IIII                                      
                    4. Whole                   120 (20.0%)          IIII                                      
                    5. Whole Legs              120 (20.0%)          IIII                                      

2    Year           Mean (sd) : 2008.5 (2.9)   2004 : 60 (10.0%)    II                    600        0        
     [numeric]      min < med < max:           2005 : 60 (10.0%)    II                    (100.0%)   (0.0%)   
                    2004 < 2008.5 < 2013       2006 : 60 (10.0%)    II                                        
                    IQR (CV) : 5 (0)           2007 : 60 (10.0%)    II                                        
                                               2008 : 60 (10.0%)    II                                        
                                               2009 : 60 (10.0%)    II                                        
                                               2010 : 60 (10.0%)    II                                        
                                               2011 : 60 (10.0%)    II                                        
                                               2012 : 60 (10.0%)    II                                        
                                               2013 : 60 (10.0%)    II                                        

3    Month          1. April                    50 ( 8.3%)          I                     600        0        
     [character]    2. August                   50 ( 8.3%)          I                     (100.0%)   (0.0%)   
                    3. December                 50 ( 8.3%)          I                                         
                    4. February                 50 ( 8.3%)          I                                         
                    5. January                  50 ( 8.3%)          I                                         
                    6. July                     50 ( 8.3%)          I                                         
                    7. June                     50 ( 8.3%)          I                                         
                    8. March                    50 ( 8.3%)          I                                         
                    9. May                      50 ( 8.3%)          I                                         
                    10. November                50 ( 8.3%)          I                                         
                    [ 2 others ]               100 (16.7%)          III                                       

4    Price_Dollar   Mean (sd) : 3.4 (1.7)      32 distinct values   :                     593        7        
     [numeric]      min < med < max:                                :                     (98.8%)    (1.2%)   
                    1.9 < 2.4 < 7                                   :                                         
                    IQR (CV) : 1.8 (0.5)                            :     .         .                         
                                                                    : .   :         : .                       

5    Month_num      Mean (sd) : 6.5 (3.5)      12 distinct values   :                 :   600        0        
     [numeric]      min < med < max:                                :                 :   (100.0%)   (0.0%)   
                    1 < 6.5 < 12                                    : . . . . . . . . :                       
                    IQR (CV) : 5.5 (0.5)                            : : : : : : : : : :                       
                                                                    : : : : : : : : : :                       

6    Chicken_ID     Mean (sd) : 3 (1.4)        1 : 120 (20.0%)      IIII                  600        0        
     [numeric]      min < med < max:           2 : 120 (20.0%)      IIII                  (100.0%)   (0.0%)   
                    1 < 3 < 5                  3 : 120 (20.0%)      IIII                                      
                    IQR (CV) : 2 (0.5)         4 : 120 (20.0%)      IIII                                      
                                               5 : 120 (20.0%)      IIII                                      
--------------------------------------------------------------------------------------------------------------