Challenge 4 Instructions

challenge_4
Author

Zhiyuan Zhou

Published

August 18, 2022

Code
library(tidyverse)
library(readxl)
library(lubridate)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.csv⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions ⭐⭐⭐⭐⭐
Code
poultry <- read_csv("_data/poultry_tidy.csv",
                     show_col_types = FALSE)

Briefly describe the data

This is price of poultry in month including whole chicken, B/S Breast, Bone-in Breast, Whole Legs, Thighs from 2004 to 2013.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Code
#chck missing
poultry%>%
  summarise_all(funs(sum(is.na(.))))
# A tibble: 1 × 4
  Product  Year Month Price_Dollar
    <int> <int> <int>        <int>
1       0     0     0            7
Code
poultry_tidy<-
  poultry%>%
  transmute(
    Product,
    YM=as.Date(ymd(paste0(Year,"-",Month,"-01"))),
    Price_Dollar=ifelse(is.na(Price_Dollar),0,Price_Dollar))

poultry_wide<-
  poultry_tidy%>%
  pivot_wider(
  names_from = YM, values_from = Price_Dollar ,values_fill = 0
  )
#sanity check
#chck missing
poultry_tidy%>%
  summarise_all(funs(sum(is.na(.))))
# A tibble: 1 × 3
  Product    YM Price_Dollar
    <int> <int>        <int>
1       0     0            0

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

Dates are combined to be YYYY-MM-DD

NANs are filled with 0

Any additional comments?