Code
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Prachiti Parkar
March 22, 2023
Today’s challenge is to:
pivot_longerRead in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 6 × 6
  month    year  xlarge_dozen xlarge_halfdozen large_dozen large_halfdozen
  <chr>    <chr> <chr>        <chr>            <chr>       <chr>          
1 January  2004  126          230              132         230            
2 February 2004  128.5        226.25           134.5       230            
3 March    2004  131          225              137         230            
4 April    2004  131          225              137         234.5          
5 May      2004  131          225              137         236            
6 June     2004  133.5        231.375          137         241            [1] 120   6    month               year           xlarge_dozen       xlarge_halfdozen  
 Length:120         Length:120         Length:120         Length:120        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 large_dozen        large_halfdozen   
 Length:120         Length:120        
 Class :character   Class :character  
 Mode  :character   Mode  :character  Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”
The dataset contains 6 columns and 120 rows. The dataset can be pivoted to size (extra large and large) and quantity (dozen and half_dozen) and the values shifted to prices. This would be better since it would be better to read data as per quantity and size.
The end result for our dataset would be to see large and xlarge under size column and halfdozen and dozen under quantity column and its respective values under a new column called price.
rows = 120*4 (4 columns pivoted) = 480 columns = 5 (4 columns made into 3 -> size quantity and price)
Document your work here.
The final dimensions would be 480*5
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 6 × 5
  month    year  size   quantity  price 
  <chr>    <chr> <chr>  <chr>     <chr> 
1 January  2004  xlarge dozen     126   
2 January  2004  xlarge halfdozen 230   
3 January  2004  large  dozen     132   
4 January  2004  large  halfdozen 230   
5 February 2004  xlarge dozen     128.5 
6 February 2004  xlarge halfdozen 226.25Yes, once it is pivoted long, our resulting data are \(480x5\) - exactly what we expected!
Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?
The new case would be easily readable. Also one can add medium (size) and halfdozen (quantity) and it would be easy to add to our dataset without any addition of columns.
Any additional comments?
---
title: "Challenge 3"
author: "Prachiti Parkar"
description: "Tidy Data: Pivoting"
date: "03/22/2023"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - eggs
  - australian_marriage
  - usa_households
  - sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1.  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2.  identify what needs to be done to tidy the current data
3.  anticipate the shape of pivoted data
4.  pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
-   animal_weights.csv ⭐
-   eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
-   australian_marriage\*.xls ⭐⭐⭐
-   USA Households\*.xlsx ⭐⭐⭐⭐
-   sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
library(readr)
eggs_tidy_data <- read_csv("_data/eggs_tidy.csv",col_names = c("month", "year","xlarge_dozen",
                               "xlarge_halfdozen", "large_dozen",
                               "large_halfdozen"))
eggs_tidy_data = eggs_tidy_data[-1,]
view(eggs_tidy_data)
head(eggs_tidy_data)
dim(eggs_tidy_data)
# Summary of the dataset
summary(eggs_tidy_data)
```
### Briefly describe the data
Describe the data, and be sure to comment on why you are planning to pivot it to make it "tidy"
The dataset contains 6 columns and 120 rows. The dataset can be pivoted to size (extra large and large) and quantity (dozen and half_dozen) and the values shifted to prices. This would be better since it would be better to read data as per quantity and size.
## Anticipate the End Result
The end result for our dataset would be to see large and xlarge under size column and halfdozen and dozen under quantity column and its respective values under a new column called price.
### Example: find current and future data dimensions
```{r}
```
rows = 120*4 (4 columns pivoted) = 480
columns = 5 (4 columns made into 3 -> size quantity and price)
### Challenge: Describe the final dimensions
Document your work here.
```{r}
```
The final dimensions would be 480*5
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Example
```{r}
#| tbl-cap: Pivoted Example
#df<-pivot_longer(df, col = c(outgoing, incoming),
#                 names_to="trade_direction",
#                 values_to = "trade_value")
#df
eggs_pivot_data<-pivot_longer(eggs_tidy_data, cols = contains("dozen"),
                              names_to= c("size", "quantity"),
                              names_sep = "_",
                              values_to = "price")
head(eggs_pivot_data)
```
Yes, once it is pivoted long, our resulting data are $480x5$ - exactly what we expected!
### Challenge: Pivot the Chosen Data
Document your work here. What will a new "case" be once you have pivoted the data? How does it meet requirements for tidy data?
```{r}
```
The new case would be easily readable. Also one can add medium (size) and halfdozen (quantity) and it would be easy to add to our dataset without any addition of columns.
Any additional comments?