Challenge 4 Instructions

challenge_4

abc_poll

eggs

fed_rates

hotel_bookings

debt

Author

Meredith Rolfe

Published

August 18, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
tidy data (as needed, including sanity checks)
identify variables that need to be mutated
mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

abc_poll.csv ⭐
poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
FedFundsRate.csv⭐⭐⭐
hotel_bookings.csv⭐⭐⭐⭐
debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

Code

library(readr)
  eggs<- read_csv("_data/eggs_tidy.csv")
eggs

# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen

Code

eggs_long<- eggs_long%>%
  mutate(size = case_when(startsWith(eggType, "extra")~"extra_large",startsWith(eggType, "large")~"large"))

Error in mutate(., size = case_when(startsWith(eggType, "extra") ~ "extra_large", : object 'eggs_long' not found

Code

eggs_long

Error in eval(expr, envir, enclos): object 'eggs_long' not found

Code

#mutating variable to find average of eggs weight 
#grouping by year for average 
eggs%>%
  select(month,year,large_half_dozen,extra_large_half_dozen)%>%
  group_by(year)%>%
  mutate(average_extra_large_half_dozen=mean(extra_large_half_dozen))%>%
  mutate(average_large_half_dozen=mean(large_half_dozen))

# A tibble: 120 × 6
# Groups:   year [10]
   month      year large_half_dozen extra_large_half_dozen average_ext…¹ avera…²
   <chr>     <dbl>            <dbl>                  <dbl>         <dbl>   <dbl>
 1 January    2004             126                    132           136.    130.
 2 February   2004             128.                   134.          136.    130.
 3 March      2004             131                    137           136.    130.
 4 April      2004             131                    137           136.    130.
 5 May        2004             131                    137           136.    130.
 6 June       2004             134.                   137           136.    130.
 7 July       2004             134.                   137           136.    130.
 8 August     2004             134.                   137           136.    130.
 9 September  2004             130.                   136.          136.    130.
10 October    2004             128.                   136.          136.    130.
# … with 110 more rows, and abbreviated variable names
#   ¹average_extra_large_half_dozen, ²average_large_half_dozen

Briefly describe the data

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

Any additional comments?