challenge_3
Author

Tyler Tewksbury

Published

August 25, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Read in data

Code
eggs <-read_csv("_data/eggs_tidy.csv",
                        show_col_types = FALSE)

Briefly describe the data

The dataset is pre-tidy, with 120 rows and 6 columns. The data shows the price of eggs based on their size across different months and years.

Challenge: Describe the final dimensions

Code
nrow(eggs)
[1] 120
Code
ncol(eggs)
[1] 6
Code
nrow(eggs) * (ncol(eggs)-2)
[1] 480

The dataset has 120 rows and 6 columns. Because there are two grouping variables, in the nrow - ncol calculation we subtract 2 from col. This gives 480, the amount of expected rows when pivoting the dataset longer.

Challenge: Pivot the Chosen Data

Code
long_eggs <- eggs%>%
  pivot_longer(cols=contains ("large"),
               names_to = c("size", "quantity"),
               names_sep="_",
               values_to = "price")

In the long dataset, are now new cases that show the price per size and quantity. There are 4 identifiers/category variables (two more than the previous dataset) and 1 value per row, which makes the dataset far easier to work with and simply look at. Visualizations and other analysis can be done now without unnecessary steps in each simple analysis, because the data now has that 1 value per row.