Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Guanhua Tan
October 1, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 600 × 4
Product Year Month Price_Dollar
<chr> <dbl> <chr> <dbl>
1 Whole 2013 January 2.38
2 Whole 2013 February 2.38
3 Whole 2013 March 2.38
4 Whole 2013 April 2.38
5 Whole 2013 May 2.38
6 Whole 2013 June 2.38
7 Whole 2013 July 2.38
8 Whole 2013 August 2.38
9 Whole 2013 September 2.38
10 Whole 2013 October 2.38
# … with 590 more rows
The dataset records the prices of Chicken and its different parts from Jan. 2004 to Jan 2013. The column names includes Product, Year, Month, Price_Dollar. There are a few values (price) missing.
First, I want data have a clearer order so I move Year and Month column names to the first two columns. Additionally, I’d like to recreate column names as Chicken’s different parts. So I use pivot_wider function to rearrange the data.
# A tibble: 120 × 7
Year Month Whole `B/S Breast` `Bone-in Breast` `Whole Legs` Thighs
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2013 January 2.38 7.04 3.90 2.04 2.16
2 2013 February 2.38 7.04 3.90 2.04 2.16
3 2013 March 2.38 7.04 3.90 2.04 2.16
4 2013 April 2.38 7.04 3.90 2.04 2.16
5 2013 May 2.38 7.04 3.90 2.04 2.16
6 2013 June 2.38 7.04 3.90 2.04 2.16
7 2013 July 2.38 7.04 3.90 2.04 2.16
8 2013 August 2.38 7.04 3.90 2.04 2.16
9 2013 September 2.38 7.04 3.90 2.04 2.16
10 2013 October 2.38 7.04 3.90 2.04 2.16
# … with 110 more rows
I found I may mutate the average price of every part of Chicken per year.
I grouped the data by year and mutate 5 columns to calculate the mean of every part of Chiken every year.
# mutate data
df %>%
group_by(Year) %>%
mutate(whole_ave = mean(Whole),
breast_ave= mean(`B/S Breast`),
bone_in_ave = mean (`Bone-in Breast`),
whole_legs_ave = mean (`Whole Legs`),
thighs_ave = mean(`Thighs`)
)%>%
select(Year, Month, Whole, whole_ave, `B/S Breast`, breast_ave, `Bone-in Breast`, bone_in_ave, `Whole Legs`, whole_legs_ave, Thighs, thighs_ave)
# A tibble: 120 × 12
# Groups: Year [10]
Year Month Whole whole…¹ B/S B…² breas…³ Bone-…⁴ bone_…⁵ Whole…⁶ whole…⁷
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2013 January 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
2 2013 February 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
3 2013 March 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
4 2013 April 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
5 2013 May 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
6 2013 June 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
7 2013 July 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
8 2013 August 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
9 2013 September 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
10 2013 October 2.38 2.38 7.04 7.04 3.90 3.90 2.04 2.04
# … with 110 more rows, 2 more variables: Thighs <dbl>, thighs_ave <dbl>, and
# abbreviated variable names ¹whole_ave, ²`B/S Breast`, ³breast_ave,
# ⁴`Bone-in Breast`, ⁵bone_in_ave, ⁶`Whole Legs`, ⁷whole_legs_ave
---
title: "Challenge 4"
author: "Guanhua Tan"
desription: "More data wrangling: pivoting"
date: "10/1/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- eggs
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
library(dplyr)
library(readxl)
poultry<-read_excel("_data/poultry_tidy.xlsx")
poultry
```
### Briefly describe the data
The dataset records the prices of Chicken and its different parts from Jan. 2004 to Jan 2013. The column names includes Product, Year, Month, Price_Dollar. There are a few values (price) missing.
## Tidy Data (as needed)
First, I want data have a clearer order so I move Year and Month column names to the first two columns. Additionally, I'd like to recreate column names as Chicken's different parts. So I use pivot_wider function to rearrange the data.
```{r}
# tidy data, use Chicken's different parts as column names
df <-poultry %>% select(Year, Month, Product, Price_Dollar) %>% pivot_wider(names_from = Product, values_from = Price_Dollar)
df
```
## Identify variables that need to be mutated
I found I may mutate the average price of every part of Chicken per year.
I grouped the data by year and mutate 5 columns to calculate the mean of every part of Chiken every year.
```{r}
# mutate data
df %>%
group_by(Year) %>%
mutate(whole_ave = mean(Whole),
breast_ave= mean(`B/S Breast`),
bone_in_ave = mean (`Bone-in Breast`),
whole_legs_ave = mean (`Whole Legs`),
thighs_ave = mean(`Thighs`)
)%>%
select(Year, Month, Whole, whole_ave, `B/S Breast`, breast_ave, `Bone-in Breast`, bone_in_ave, `Whole Legs`, whole_legs_ave, Thighs, thighs_ave)
```