Homework 2

Importing Data and Using dyplyr

Timothy Lennon
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
library(readxl)
library(dplyr)
library(rlang)
poultryTidy <- read_xlsx("poultry_tidy.xlsx")  
head(poultryTidy)  
# A tibble: 6 x 4
  Product  Year Month    Price_Dollar
  <chr>   <dbl> <chr>           <dbl>
1 Whole    2013 January          2.38
2 Whole    2013 February         2.38
3 Whole    2013 March            2.38
4 Whole    2013 April            2.38
5 Whole    2013 May              2.38
6 Whole    2013 June             2.38

The data set is poultryTidy. It is a tibble data frame.
There are ‘r ncol(poultryTidy)’ columns of variables:
1. Product- Includes Whole, B/S Breast, Bone_In Breast, Whole Legs, Thighs. It is a character string.
2. Year- Descends from 2013 to 2003. It is a double type.
3. Month- This column ascends from January to December. It is a character string and t is the full name of the month
4. Price_Dollar- It is a double and is the price in dollars per unit. It does not specify if it is per pound or per is.theme

ggplot(data = poultryTidy) + 
   geom_point(mapping = aes(x = Year, y = Price_Dollar, color = Product))  

This Graphic shows the cost of different meat products from 2004 to 2013. Breast and Bone-in Breast are considerably more expensive than chicken parts or when the bird is sold as a whole.

poultryTidy <- rename(poultryTidy, costPerPound = Price_Dollar)  

poultryTidy %>%  
filter(costPerPound >= 5) %>%  
ggplot() +  
     geom_point(mapping = aes(x = Year, y = costPerPound, color = Product))  

Above, we renamed the column ‘Price_Dollar’ to ‘costPerPound’ and then we focused this graphic on products that were more expensive than $5. This included only B/S Breast.

poultryTidy <- poultryTidy %>%
  pivot_wider(names_from = Product, values_from = costPerPound)

Above we pivoted the column Product to seperate based on their character string to create new columns separated by the type of product. Below shows the data in this new arrangement.

arrange(poultryTidy, Year)  
# A tibble: 120 x 7
    Year Month Whole `B/S Breast` `Bone-in Breast` `Whole Legs` Thighs
   <dbl> <chr> <dbl>        <dbl>            <dbl>        <dbl>  <dbl>
 1  2004 Janu~  1.98         6.46            NA            1.94  NA   
 2  2004 Febr~  1.98         6.42            NA            1.94   2.03
 3  2004 March  2.09         6.42            NA            1.94   2.03
 4  2004 April  2.12         6.42            NA            1.94   2.03
 5  2004 May    2.14         6.42            NA            1.94   2.03
 6  2004 June   2.16         6.41            NA            2.02   2.00
 7  2004 July   2.17         6.42             3.90         2.04   2.00
 8  2004 Augu~  2.17         6.42             3.90         2.04   2.00
 9  2004 Sept~  2.17         6.42             3.90         2.04   2.00
10  2004 Octo~  2.17         6.42             3.90         2.04   2.00
# ... with 110 more rows

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Lennon (2022, Jan. 8). Data Analytics and Computational Social Science: Homework 2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlennont853000/

BibTeX citation

@misc{lennon2022homework,
  author = {Lennon, Timothy},
  title = {Data Analytics and Computational Social Science: Homework 2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlennont853000/},
  year = {2022}
}