Challenge 4 Solutions

challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
More data wrangling: pivoting
Author

Moira Chiong

Published

June 11, 2023

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
getwd()
[1] "C:/Users/chion/OneDrive/Desktop/DACSS 601/DACSS_601_Summer2023_Sec1/posts"
Code
 setwd("C:/Users/chion/OneDrive/Desktop/DACSS 601/DACSS_601_Summer2023_Sec1/posts/_data")
 library(readxl)
Warning: package 'readxl' was built under R version 4.1.3
Code
poultry_tidy <- read_excel("C:/Users/chion/OneDrive/Desktop/DACSS 601/DACSS_601_Summer2023_Sec1/posts/_data/poultry_tidy.xlsx")
poultry_tidy
# A tibble: 600 x 4
   Product  Year Month     Price_Dollar
   <chr>   <dbl> <chr>            <dbl>
 1 Whole    2013 January           2.38
 2 Whole    2013 February          2.38
 3 Whole    2013 March             2.38
 4 Whole    2013 April             2.38
 5 Whole    2013 May               2.38
 6 Whole    2013 June              2.38
 7 Whole    2013 July              2.38
 8 Whole    2013 August            2.38
 9 Whole    2013 September         2.38
10 Whole    2013 October           2.38
# i 590 more rows

Briefly describe the data

The data is for prices of poultry over the 2004-2013 period. There are 4 columns with 600 observations. There are five distinct cuts of poultry: “Whole”, “B/S Breast”, “Bone-in Breast”, “Whole Legs”, “Thighs” . The lowest price of cuts of poultry is 1.94 while the maximum price is 7.04. The average price for a cut of poultry is 3.39.

Code
nrow(poultry_tidy)
[1] 600
Code
ncol(poultry_tidy)
[1] 4
Code
unique(poultry_tidy$product)
Warning: Unknown or uninitialised column: `product`.
NULL
Code
unique(poultry_tidy$Year)
 [1] 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004
Code
min(poultry_tidy$Price_Dollar, na.rm=TRUE)
[1] 1.935
Code
max(poultry_tidy$Price_Dollar, na.rm=TRUE)
[1] 7.0375
Code
mean(poultry_tidy$Price_Dollar, na.rm=TRUE)
[1] 3.390472

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

Code
round(poultry_tidy$Price_Dollar, digits = 2)
  [1] 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 7.04 7.04 7.04
 [16] 7.04 7.04 7.04 7.04 7.04 7.04 7.04 7.04 7.04 3.90 3.90 3.90 3.90 3.90 3.90
 [31] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
 [46] 2.04 2.04 2.04 2.16 2.16 2.16 2.16 2.16 2.16 2.16 2.16 2.16 2.16 2.16 2.16
 [61] 2.35 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 2.38 6.38 7.00 7.00
 [76] 7.00 7.00 7.00 7.00 7.00 7.00 7.00 7.04 7.04 3.90 3.90 3.90 3.90 3.90 3.90
 [91] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[106] 2.04 2.04 2.04 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.16 2.16 2.16 2.16 2.16
[121] 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 2.35 6.38 6.38 6.38
[136] 6.38 6.38 6.38 6.38 6.38 6.38 6.38 6.38 6.38 3.90 3.90 3.90 3.90 3.90 3.90
[151] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[166] 2.04 2.04 2.04 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.15 2.15
[181] 2.48 2.48 2.48 2.42 2.35 2.35 2.42 2.35 2.35 2.35 2.35 2.35 6.46 6.46 6.46
[196] 6.46 6.46 6.46 6.46 6.42 6.38 6.38 6.38 6.38 3.90 3.90 3.90 3.90 3.90 3.90
[211] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[226] 2.04 2.04 2.04 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.19 2.15 2.15 2.15 2.15
[241] 2.48 2.48 2.48 2.48 2.48 2.48 2.48 2.48 2.48 2.48 2.48 2.48 6.46 6.46 6.46
[256] 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 3.90 3.90 3.90 3.90 3.90 3.90
[271] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[286] 2.04 2.04 2.04 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22
[301] 2.20 2.20 2.20 2.20 2.20 2.48 2.48 2.48 2.48 2.48 2.48 2.48 6.46 6.46 6.46
[316] 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 3.90 3.90 3.90 3.90 3.90 3.90
[331] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[346] 2.04 2.04 2.04 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22
[361] 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 6.46 6.46 6.46
[376] 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 3.90 3.90 3.90 3.90 3.90 3.90
[391] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[406] 2.04 2.04 2.04 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22
[421] 2.17 2.17 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.20 6.46 6.46 6.46
[436] 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 3.90 3.90 3.90 3.90 3.90 3.90
[451] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[466] 2.04 2.04 2.04 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22
[481] 2.17 2.17 2.17 2.17 2.17 2.17 2.17 2.17 2.17 2.17 2.17 2.17 6.44 6.46 6.46
[496] 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 6.46 3.90 3.90 3.90 3.90 3.90 3.90
[511] 3.90 3.90 3.90 3.90 3.90 3.90 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04 2.04
[526] 2.04 2.04 2.04 2.13 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22 2.22
[541] 1.98 1.98 2.09 2.12 2.14 2.16 2.17 2.17 2.17 2.17 2.17 2.17 6.46 6.42 6.42
[556] 6.42 6.42 6.41 6.42 6.42 6.42 6.42 6.42 6.42   NA   NA   NA   NA   NA   NA
[571] 3.90 3.90 3.90 3.90 3.90 3.90 1.94 1.94 1.94 1.94 1.94 2.02 2.04 2.04 2.04
[586] 2.04 2.04 2.04   NA 2.03 2.03 2.03 2.03 2.00 2.00 2.00 2.00 2.00 2.00 2.00

Any additional comments?