Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Abhinav Reddy Yadatha
May 4, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 6 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 February 2004 128. 226. 134.
2 March 2004 131 225 137
3 April 2004 131 225 137
4 May 2004 131 225 137
5 June 2004 134. 231. 137
6 July 2004 134. 234. 137
# ℹ 1 more variable: extra_large_dozen <dbl>
[1] 119 6
month year large_half_dozen large_dozen
Length:119 Min. :2004 Min. :128.5 Min. :225.0
Class :character 1st Qu.:2006 1st Qu.:130.4 1st Qu.:233.5
Mode :character Median :2009 Median :174.5 Median :267.5
Mean :2009 Mean :155.4 Mean :254.4
3rd Qu.:2011 3rd Qu.:174.5 3rd Qu.:268.0
Max. :2013 Max. :178.0 Max. :277.5
extra_large_half_dozen extra_large_dozen
Min. :134.5 Min. :230.0
1st Qu.:136.4 1st Qu.:241.5
Median :185.5 Median :285.5
Mean :164.5 Mean :267.1
3rd Qu.:185.5 3rd Qu.:285.5
Max. :188.1 Max. :290.0
The dataset encompasses ten years of monthly data, specifically from January 2004 to December 2013, and tracks the mean volume of six distinct types of egg cartons. Although the dataset contains decimal values, which indicate the values are averages, it’s worth noting that cartons of eggs are sold in whole units. For instance, in February 2004, the dataset records a value of 128.5 for the volume of large half dozen sized cartons.
The end result would display the sizes “large” and “xlarge” in the “size” column, while the “quantity” column should include “halfdozen” and “dozen.” Finally, the corresponding values for each combination of size and quantity should be listed in a new column titled “price.”
rows = 120*4 (as 4 columns pivoted) == 480 columns = 5 (4 columns made into 3 -> size quantity and price)
The final dimensions of the dataset would be 480 x 5.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 6 × 5
month year size quantity price
<chr> <dbl> <chr> <chr> <dbl>
1 February 2004 large half 128.
2 February 2004 large dozen 226.
3 February 2004 extra large 134.
4 February 2004 extra large 230
5 March 2004 large half 131
6 March 2004 large dozen 225
Yes, once it is pivoted long, our resulting data are \(480x5\) - exactly what we expected!
Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?
Any additional comments?
The new case would be highly readible and user-friendly. Additionally, it’s possible to incorporate additional data points, such as “medium” for size and “halfdozen” for quantity, to the existing dataset without requiring the addition of any new columns.
---
title: "Challenge 3"
author: "Abhinav Reddy Yadatha"
desription: "Tidy Data: Pivoting"
date: "05/04/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs_tidy.csv
- Abhinav Reddy Yadatha
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
library(readr)
#Read eggs_tidy csv data
egg_tidy_data <- read_csv("_data/eggs_tidy.csv",show_col_types = FALSE)
egg_tidy_data = egg_tidy_data[-1,]
head(egg_tidy_data)
view(egg_tidy_data)
dim(egg_tidy_data)
# Summary of the eggs dataset
summary(egg_tidy_data)
```
### Briefly describe the data
The dataset encompasses ten years of monthly data, specifically from January 2004 to December 2013, and tracks the mean volume of six distinct types of egg cartons. Although the dataset contains decimal values, which indicate the values are averages, it's worth noting that cartons of eggs are sold in whole units. For instance, in February 2004, the dataset records a value of 128.5 for the volume of large half dozen sized cartons.
## Anticipate the End Result
The end result would display the sizes "large" and "xlarge" in the "size" column, while the "quantity" column should include "halfdozen" and "dozen." Finally, the corresponding values for each combination of size and quantity should be listed in a new column titled "price."
### Challenge: Describe the final dimensions
rows = 120*4 (as 4 columns pivoted) == 480
columns = 5 (4 columns made into 3 -> size quantity and price)
The final dimensions of the dataset would be 480 x 5.
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Example
```{r}
#| tbl-cap: Pivoted Example
#df<-pivot_longer(df, col = c(outgoing, incoming),
# names_to="trade_direction",
# values_to = "trade_value")
#df
egg_pivot_data<-pivot_longer(egg_tidy_data, cols = contains("dozen"),
names_to= c("size", "quantity"),
names_sep = "_",
values_to = "price")
head(egg_pivot_data)
```
Yes, once it is pivoted long, our resulting data are $480x5$ - exactly what we expected!
### Challenge: Pivot the Chosen Data
Document your work here. What will a new "case" be once you have pivoted the data? How does it meet requirements for tidy data?
```{r}
```
Any additional comments?
The new case would be highly readible and user-friendly. Additionally, it's possible to incorporate additional data points, such as "medium" for size and "halfdozen" for quantity, to the existing dataset without requiring the addition of any new columns.