DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Tidy Data (as needed)
  • Identify variables that need to be mutated

Challenge 4

  • Show All Code
  • Hide All Code

  • View Source
eggs
Author

Guanhua Tan

Published

October 1, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
library(dplyr)
library(readxl) 
poultry<-read_excel("_data/poultry_tidy.xlsx")
poultry
# A tibble: 600 × 4
   Product  Year Month     Price_Dollar
   <chr>   <dbl> <chr>            <dbl>
 1 Whole    2013 January           2.38
 2 Whole    2013 February          2.38
 3 Whole    2013 March             2.38
 4 Whole    2013 April             2.38
 5 Whole    2013 May               2.38
 6 Whole    2013 June              2.38
 7 Whole    2013 July              2.38
 8 Whole    2013 August            2.38
 9 Whole    2013 September         2.38
10 Whole    2013 October           2.38
# … with 590 more rows

Briefly describe the data

The dataset records the prices of Chicken and its different parts from Jan. 2004 to Jan 2013. The column names includes Product, Year, Month, Price_Dollar. There are a few values (price) missing.

Tidy Data (as needed)

First, I want data have a clearer order so I move Year and Month column names to the first two columns. Additionally, I’d like to recreate column names as Chicken’s different parts. So I use pivot_wider function to rearrange the data.

Code
# tidy data, use Chicken's different parts as column names
df <-poultry %>% select(Year, Month, Product, Price_Dollar) %>% pivot_wider(names_from = Product, values_from = Price_Dollar)
df
# A tibble: 120 × 7
    Year Month     Whole `B/S Breast` `Bone-in Breast` `Whole Legs` Thighs
   <dbl> <chr>     <dbl>        <dbl>            <dbl>        <dbl>  <dbl>
 1  2013 January    2.38         7.04             3.90         2.04   2.16
 2  2013 February   2.38         7.04             3.90         2.04   2.16
 3  2013 March      2.38         7.04             3.90         2.04   2.16
 4  2013 April      2.38         7.04             3.90         2.04   2.16
 5  2013 May        2.38         7.04             3.90         2.04   2.16
 6  2013 June       2.38         7.04             3.90         2.04   2.16
 7  2013 July       2.38         7.04             3.90         2.04   2.16
 8  2013 August     2.38         7.04             3.90         2.04   2.16
 9  2013 September  2.38         7.04             3.90         2.04   2.16
10  2013 October    2.38         7.04             3.90         2.04   2.16
# … with 110 more rows

Identify variables that need to be mutated

I found I may mutate the average price of every part of Chicken per year.

I grouped the data by year and mutate 5 columns to calculate the mean of every part of Chiken every year.

Code
# mutate data
df %>% 
  group_by(Year) %>% 
  mutate(whole_ave = mean(Whole),
         breast_ave= mean(`B/S Breast`),
         bone_in_ave = mean (`Bone-in Breast`),
         whole_legs_ave = mean (`Whole Legs`),
         thighs_ave = mean(`Thighs`)
         )%>%
  select(Year, Month, Whole, whole_ave, `B/S Breast`, breast_ave, `Bone-in Breast`, bone_in_ave, `Whole Legs`, whole_legs_ave, Thighs, thighs_ave)
# A tibble: 120 × 12
# Groups:   Year [10]
    Year Month     Whole whole…¹ B/S B…² breas…³ Bone-…⁴ bone_…⁵ Whole…⁶ whole…⁷
   <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1  2013 January    2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 2  2013 February   2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 3  2013 March      2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 4  2013 April      2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 5  2013 May        2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 6  2013 June       2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 7  2013 July       2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 8  2013 August     2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
 9  2013 September  2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
10  2013 October    2.38    2.38    7.04    7.04    3.90    3.90    2.04    2.04
# … with 110 more rows, 2 more variables: Thighs <dbl>, thighs_ave <dbl>, and
#   abbreviated variable names ¹​whole_ave, ²​`B/S Breast`, ³​breast_ave,
#   ⁴​`Bone-in Breast`, ⁵​bone_in_ave, ⁶​`Whole Legs`, ⁷​whole_legs_ave
Source Code
---
title: "Challenge 4"
author: "Guanhua Tan"
desription: "More data wrangling: pivoting"
date: "10/1/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - eggs
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

-   abc_poll.csv ⭐
-   poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
-   FedFundsRate.csv⭐⭐⭐
-   hotel_bookings.csv⭐⭐⭐⭐
-   debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

```{r}
library(dplyr)
library(readxl) 
poultry<-read_excel("_data/poultry_tidy.xlsx")
poultry

```
### Briefly describe the data
The dataset records the prices of Chicken and its different parts from Jan. 2004 to Jan 2013. The column names includes Product, Year, Month, Price_Dollar. There are a few values (price) missing.

## Tidy Data (as needed)

First, I want data have a clearer order so I move Year and Month column names to the first two columns. Additionally, I'd like to recreate column names as Chicken's different parts. So I use pivot_wider function to rearrange the data.

```{r}
# tidy data, use Chicken's different parts as column names
df <-poultry %>% select(Year, Month, Product, Price_Dollar) %>% pivot_wider(names_from = Product, values_from = Price_Dollar)
df
```


## Identify variables that need to be mutated

I found I may mutate the average price of every part of Chicken per year.

I grouped the data by year and mutate 5 columns to calculate the mean of every part of Chiken every year.



```{r}
# mutate data
df %>% 
  group_by(Year) %>% 
  mutate(whole_ave = mean(Whole),
         breast_ave= mean(`B/S Breast`),
         bone_in_ave = mean (`Bone-in Breast`),
         whole_legs_ave = mean (`Whole Legs`),
         thighs_ave = mean(`Thighs`)
         )%>%
  select(Year, Month, Whole, whole_ave, `B/S Breast`, breast_ave, `Bone-in Breast`, bone_in_ave, `Whole Legs`, whole_legs_ave, Thighs, thighs_ave)
```