Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Pooja Shah
April 27, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 121 × 5
date xlarge_dozen xlarge_halfdozen large_dozen large_halfdozen
<chr> <chr> <chr> <chr> <chr>
1 <NA> "Extra Large \nDozen" "Extra Large 1/2… "Large \nD… "Large \n1/2 D…
2 Jan 2004 "230" "132" "230" "126"
3 February "230" "134.5" "226.25" "128.5"
4 March "230" "137" "225" "131"
5 April "234.5" "137" "225" "131"
6 May "236" "137" "225" "131"
7 June "241" "137" "231.375" "133.5"
8 July "241" "137" "233.5" "133.5"
9 August "241" "137" "233.5" "133.5"
10 September "241" "135.875" "233.5" "129.75"
# ℹ 111 more rows
Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy” I am working here with the “Data” sheet of the “organiceggpoultry” excel file. The data is in the form of multiple columns for types of eggs. The rows are prices of these eggs for each month.
The first step in pivoting the data is to try to come up with a concrete vision of what the end product should look like - that way you will know whether or not your pivoting was successful.
One easy way to do this is to think about the dimensions of your current data (tibble, dataframe, or matrix), and then calculate what the dimensions of the pivoted data should be.
The tidy data should be of the form of columns for type of poultry, year and then the price for 12 months.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 22 × 2
date n
<chr> <int>
1 April 10
2 August 10
3 December 10
4 February 8
5 February /1 2
6 Jan 2004 1
7 Jan 2005 1
8 Jan 2006 1
9 Jan 2007 1
10 Jan 2008 1
# ℹ 12 more rows
# A tibble: 21 × 2
date n
<chr> <int>
1 April 10
2 August 10
3 December 10
4 February 10
5 Jan 2004 1
6 Jan 2005 1
7 Jan 2006 1
8 Jan 2007 1
9 Jan 2008 1
10 Jan 2009 1
# ℹ 11 more rows
# A tibble: 120 × 6
month year xlarge_dozen xlarge_halfdozen large_dozen large_halfdozen
<chr> <chr> <chr> <chr> <chr> <chr>
1 Jan 2004 230 132 230 126
2 February 2004 230 134.5 226.25 128.5
3 March 2004 230 137 225 131
4 April 2004 234.5 137 225 131
5 May 2004 236 137 225 131
6 June 2004 241 137 231.375 133.5
7 July 2004 241 137 233.5 133.5
8 August 2004 241 137 233.5 133.5
9 September 2004 241 135.875 233.5 129.75
10 October 2004 241 135.5 233.5 128.5
# ℹ 110 more rows
Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data? The year is a seperate entity now. The data is divided into cols for the month. The row data is year and type of egg.
# A tibble: 40 × 14
year eggType Jan February March April May June July August September
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2004 xlarge_d… 230 230 230 234.5 236 241 241 241 241
2 2004 xlarge_h… 132 134.5 137 137 137 137 137 137 135.875
3 2004 large_do… 230 226.25 225 225 225 231.… 233.5 233.5 233.5
4 2004 large_ha… 126 128.5 131 131 131 133.5 133.5 133.5 129.75
5 2005 xlarge_d… 241 241 241 241 241 241 241 241 241
6 2005 xlarge_h… 135.5 135.5 135.5 135.5 135.5 135.5 135.5 135.5 135.5
7 2005 large_do… 233.5 233.5 233.5 233.5 233.5 233.5 233.5 233.5 233.5
8 2005 large_ha… 128.5 128.5 128.5 128.5 128.5 128.5 128.5 128.5 128.5
9 2006 xlarge_d… 241 241 241.… 241.5 241.5 241.5 241.5 241.5 241.5
10 2006 xlarge_h… 135.5 135.5 135.5 135.5 135.5 135.5 135.5 135.5 135.5
# ℹ 30 more rows
# ℹ 3 more variables: October <chr>, November <chr>, December <chr>
---
title: "Challenge 3"
author: "Pooja Shah"
description: "Tidy Data: Pivoting"
date: "04/27/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
#reading and printing unclean data
eggs = read_excel("_data/organiceggpoultry.xls", sheet = "Data", range = "B5:F125", col_names = c("date", "xlarge_dozen",
"xlarge_halfdozen", "large_dozen",
"large_halfdozen"))
eggs
```
### Briefly describe the data
Describe the data, and be sure to comment on why you are planning to pivot it to make it "tidy"
I am working here with the "Data" sheet of the "organiceggpoultry" excel file.
The data is in the form of multiple columns for types of eggs. The rows are prices of these eggs for each month.
## Anticipate the End Result
The first step in pivoting the data is to try to come up with a concrete vision of what the end product *should* look like - that way you will know whether or not your pivoting was successful.
One easy way to do this is to think about the dimensions of your current data (tibble, dataframe, or matrix), and then calculate what the dimensions of the pivoted data should be.
The tidy data should be of the form of columns for type of poultry, year and then the price for 12 months.
### Find current and future data dimensions
```{r}
#existing rows
nrow(eggs)
#existing columns
ncol(eggs)
#expected rows
row = ((nrow(eggs) - 1)/12) * (ncol(eggs) - 1)
row
# expected columns
col = 12 + 2
col
```
### Challenge: Describe the final dimensions
```{r}
print(paste("The final dimensions should be ", row, "x", col))
```
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
### Example
```{r}
eggs <- eggs %>% drop_na
eggs%>%
count(date)
eggs<-eggs%>%
mutate(date = str_remove(date, " /1"))
eggs%>%
count(date)
eggs<-eggs%>%
separate(date, into=c("month", "year"), sep=" ")%>%
fill(year)
eggs
```
### Challenge: Pivot the Chosen Data
Document your work here. What will a new "case" be once you have pivoted the data? How does it meet requirements for tidy data?
The year is a seperate entity now. The data is divided into cols for the month. The row data is year and type of egg.
```{r}
eggs <- eggs %>%
pivot_longer(cols=contains("large"),
names_to = "eggType",
values_to = "avgPrice")
eggs <- eggs %>%
pivot_wider(names_from = month,
values_from = avgPrice)
eggs
```