Hw #3

Importing/Tidying Organic Egg and Poultry Data

Tim Lennon
1/21/2022
knitr::opts_chunk$set(echo = TRUE, Message = FALSE, warning = FALSE)

Loading Packages into R Environment

Choosing my data

My Data is one of the uncleaned date sets provided by the course. It is call organic egg poultry and it shows price changes for different egg and organic chicken products from January 2004 to December 2013.

I chose this data set because it wasn’t perfectly clean and required some alteration to make it useful in statistical analysis. I Also like the idea of working with data across time because I know time-series analysis will be a very useful skill for my future career. Also, poultry products and prices are can be related to environmental issues.

organiceggpoultry <- read_excel("organiceggpoultry.xlsx", 
                                na = "too few", range = "A5:L125")
#View(organiceggpoultry)

The data is quite messy for the following reasons:

  1. ‘Too few’ needed to be changed to NA
  2. Column 7 is a blank column that needs to be selected out.
  3. The column headings are also a little long and can be shortened
  4. The date in the original excel sheet was listed as January 2004, Febuary… December, January 2005, February…December, January 2006 etc. I knew this would be difficult to pull the year out of the every 12th Month rows to apply the year to the proceeding month values, so in the excel document I altered the data by adding a column “Year”, and changed the Month so that it only had the month spelled out without a year included.
  5. With the Month and Year in different columns I then had to unite the columns to create a Date column that was imported as a character vector.
  6. This Date column would then have to be coerced into a vector classed as a date.
  7. Because there are 4 Egg Products (i.e Extra Large Dozen) but 5 Young Chicken Products (i.e. Whole Legs), I could not gather the data to combine the Egg products and Chicken Products into a single column called i.e. All_Chicken_Products. It listed the Egg Products twice to compensate for the listing of Chicken Products.
    * As result, I had to split the data into two data frames.
  8. To further tidy the data, I pivoted the data length-wise and placed the values into a new column “Egg_Price” and "Chicken_Price and placed the type of products into a new column “Egg_Product” and “Chicken_Product” respectively.
  9. The prices need to be changed to a vector classed as a double and to the second decimal place!

Merging Month and Year into a Date Column

Organic_Egg_Poultry <-  unite(organiceggpoultry, Date, 
                              Month, Year, sep = " ", 
                              remove = TRUE
                              )

Renaming Column Headings and Selecting out the blank Column 7

Organic_Egg_Poultry <- rename(Organic_Egg_Poultry, c(
                      Dozen_XL = "Extra Large \r\nDozen",
                      Half_Dozen_XL = "Extra Large 1/2 Doz.\r\n1/2 Dozen",
                      Dozen_Lg =  "Large \r\nDozen",
                      Half_Dozen_Lg = "Large \r\n1/2 Doz.",
                      BI_Breast = "Bone-in Breast",
                      BS_Breast =   "B/S Breast",
                      Whole_Legs = "Whole Legs"
                      ))
                      

Organic_Egg_Poultry_NEW <- select(Organic_Egg_Poultry,
                                  "Date",
                                  "Dozen_XL",
                                  "Half_Dozen_XL",
                                  "Dozen_Lg",
                                  "Half_Dozen_Lg", 
                                  "Whole",      
                                  "BS_Breast",    
                                  "BI_Breast",
                                  "Whole_Legs",  
                                  "Thighs")

Creating two new Data Frames and Pivoting to Tidy Data

Egg_Products <- data.frame(Date = Organic_Egg_Poultry_NEW$Date,
                  Dozen_XL = Organic_Egg_Poultry_NEW$Dozen_XL,
                  Half_Dozen_XL = Organic_Egg_Poultry_NEW$Half_Dozen_XL,
                  Dozen_Lg = Organic_Egg_Poultry_NEW$Dozen_Lg,
                  Half_Dozen_Lg = Organic_Egg_Poultry_NEW$Half_Dozen_Lg
)

#View(Egg_Products)

Egg_Products_New <- Egg_Products %>%
  pivot_longer(c("Dozen_XL", "Half_Dozen_XL", 
                 "Dozen_Lg", "Half_Dozen_Lg"),
               names_to = "Egg_Product",
               values_to = "Egg_Price"
)

#view(Egg_Products_New)

Chicken_Products <- data.frame(Date = Organic_Egg_Poultry_NEW$Date,
                               Whole = Organic_Egg_Poultry_NEW$Whole,
                               BS_Breast = Organic_Egg_Poultry_NEW$BS_Breast,
                               BI_Breast = Organic_Egg_Poultry_NEW$BI_Breast,
                               Whole_Legs = Organic_Egg_Poultry_NEW$Whole_Legs,
                               Thighs = Organic_Egg_Poultry_NEW$Thighs
)

#view(Chicken_Products)



Chicken_Products_New <- Chicken_Products %>%
  pivot_longer(c("Whole", "BS_Breast", 
                 "BI_Breast", "Whole_Legs", "Thighs"),
               names_to = "Chicken_Product",
               values_to = "Chicken_Price"
)

#view(Chicken_Products_New)

Changing the decimal place in price columns to show two decimal places

as_tibble(Egg_Products_New)
# A tibble: 480 x 3
   Date          Egg_Product   Egg_Price
   <chr>         <chr>             <dbl>
 1 January 2004  Dozen_XL           230 
 2 January 2004  Half_Dozen_XL      132 
 3 January 2004  Dozen_Lg           230 
 4 January 2004  Half_Dozen_Lg      126 
 5 February 2004 Dozen_XL           230 
 6 February 2004 Half_Dozen_XL      134.
 7 February 2004 Dozen_Lg           226.
 8 February 2004 Half_Dozen_Lg      128.
 9 March 2004    Dozen_XL           230 
10 March 2004    Half_Dozen_XL      137 
# ... with 470 more rows
Egg_Products_New$Egg_Price <- sprintf(Egg_Products_New$Egg_Price,
                                      fmt = '%#.2f'
                                      )
#view(Egg_Products_New)

as_tibble(Chicken_Products_New)
# A tibble: 600 x 3
   Date          Chicken_Product Chicken_Price
   <chr>         <chr>                   <dbl>
 1 January 2004  Whole                    198.
 2 January 2004  BS_Breast                646.
 3 January 2004  BI_Breast                 NA 
 4 January 2004  Whole_Legs               194.
 5 January 2004  Thighs                    NA 
 6 February 2004 Whole                    198.
 7 February 2004 BS_Breast                642.
 8 February 2004 BI_Breast                 NA 
 9 February 2004 Whole_Legs               194.
10 February 2004 Thighs                   203 
# ... with 590 more rows
Chicken_Products_New$Chicken_Price <- sprintf(Chicken_Products_New$Chicken_Price, 
        fmt = '%#.2f')

#view(Chicken_Products_New)

Parsing the Date vector from a character string into a date format

Chicken_Products_New$Date <- parse_date(Chicken_Products_New$Date, format = "%B %Y")
Egg_Products_New$Date <- parse_date(Egg_Products_New$Date, format = "%B %Y")

as_tibble(Chicken_Products_New)
# A tibble: 600 x 3
   Date       Chicken_Product Chicken_Price
   <date>     <chr>           <chr>        
 1 2004-01-01 Whole           197.50       
 2 2004-01-01 BS_Breast       645.50       
 3 2004-01-01 BI_Breast       NA           
 4 2004-01-01 Whole_Legs      193.50       
 5 2004-01-01 Thighs          NA           
 6 2004-02-01 Whole           197.50       
 7 2004-02-01 BS_Breast       642.50       
 8 2004-02-01 BI_Breast       NA           
 9 2004-02-01 Whole_Legs      193.50       
10 2004-02-01 Thighs          203.00       
# ... with 590 more rows
as_tibble(Egg_Products_New)
# A tibble: 480 x 3
   Date       Egg_Product   Egg_Price
   <date>     <chr>         <chr>    
 1 2004-01-01 Dozen_XL      230.00   
 2 2004-01-01 Half_Dozen_XL 132.00   
 3 2004-01-01 Dozen_Lg      230.00   
 4 2004-01-01 Half_Dozen_Lg 126.00   
 5 2004-02-01 Dozen_XL      230.00   
 6 2004-02-01 Half_Dozen_XL 134.50   
 7 2004-02-01 Dozen_Lg      226.25   
 8 2004-02-01 Half_Dozen_Lg 128.50   
 9 2004-03-01 Dozen_XL      230.00   
10 2004-03-01 Half_Dozen_XL 137.00   
# ... with 470 more rows

Convert Egg and Chicken Price Columns to Double

Egg_Products_New$Egg_Price <- as.double(Egg_Products_New$Egg_Price)
Chicken_Products_New$Chicken_Price <- as.double(Chicken_Products_New$Chicken_Price)

as_tibble(Chicken_Products_New)
# A tibble: 600 x 3
   Date       Chicken_Product Chicken_Price
   <date>     <chr>                   <dbl>
 1 2004-01-01 Whole                    198.
 2 2004-01-01 BS_Breast                646.
 3 2004-01-01 BI_Breast                 NA 
 4 2004-01-01 Whole_Legs               194.
 5 2004-01-01 Thighs                    NA 
 6 2004-02-01 Whole                    198.
 7 2004-02-01 BS_Breast                642.
 8 2004-02-01 BI_Breast                 NA 
 9 2004-02-01 Whole_Legs               194.
10 2004-02-01 Thighs                   203 
# ... with 590 more rows
as_tibble(Egg_Products_New)
# A tibble: 480 x 3
   Date       Egg_Product   Egg_Price
   <date>     <chr>             <dbl>
 1 2004-01-01 Dozen_XL           230 
 2 2004-01-01 Half_Dozen_XL      132 
 3 2004-01-01 Dozen_Lg           230 
 4 2004-01-01 Half_Dozen_Lg      126 
 5 2004-02-01 Dozen_XL           230 
 6 2004-02-01 Half_Dozen_XL      134.
 7 2004-02-01 Dozen_Lg           226.
 8 2004-02-01 Half_Dozen_Lg      128.
 9 2004-03-01 Dozen_XL           230 
10 2004-03-01 Half_Dozen_XL      137 
# ... with 470 more rows
#The two decimal places don't show in the tibble but can be seen here: 

#view(Chicken_Products_New)
#view(Egg_Products_New)

#Future data tidying I may need to do: 1. Change the prices from cents/unit to dollars/unit 1. standardize the price of eggs to be measurable between the half dozen and dozen containers by finding a price per egg. 1. Can I change date to remove the ‘day’ and keep it to just month/year. 1. Can I display i.e. Jan 2004 in graphs instead of 2004-01?

Potential Research Questions with Data

  1. How have prices of chicken products and egg products changed over time.
  2. Is there a covariation between the types of products over time?
  3. Using a grouping function or something of the sort, can we see seasonal changes in prices. i.e. Are prices higher in the winter or the summer?
  4. Does a particular product vary more over time? For instance, whole chicken does not require as much handling and employment hours as a product such as boneless, skinless breast might.Therefor, can the price be of whole chicken be more consistent than chicken thighs or boneless, skinless chicken?
  5. Are there any statistical significant ‘shocks’ to the time-series data?

Research Questions Beyond Final’s Scope and Available Data

  1. How have changes in poultry compared to changes in beef?
  2. What are ‘causal’ relationships related to the changes in poultry product prices? For example, would variability in feed prices like corn or seeds lead to changes in products.
  3. Do policies related to agriculture at a federal or state level have more of an impact on the change in poultry product pricing.
  4. Does cost of oil at well-head affect the pricing of agricultural products?
  5. Are there measurable instances of high poultry deaths due to climate change that can affect price changes?
  6. How has negative public opinion of GMOs and Antibiotics affected the demand and therefore price of organic chicken products?

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lennon (2022, Jan. 25). Data Analytics and Computational Social Science: Hw #3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlennont857314/

BibTeX citation

@misc{lennon2022hw,
  author = {Lennon, Tim},
  title = {Data Analytics and Computational Social Science: Hw #3},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlennont857314/},
  year = {2022}
}