Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
XinyangMao
March 17, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command. - aniceggpoultry.xls
[1] "Data" "Organic egg prices, 2004-13"
[3] "Organic poultry prices, 2004-13"
Use the date col in eggs form
# A tibble: 120 × 1
date
<chr>
1 Jan 2004
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
# … with 110 more rows
Import chicken data
# A tibble: 6 × 5
Whole `B/S_Breast` `Bone-in_Breast` Whole_Legs Thighs
<dbl> <dbl> <chr> <dbl> <chr>
1 198. 646. too few 194. too few
2 198. 642. too few 194. 203
3 209 642. too few 194. 203
4 212 642. too few 194. 203
5 214. 642. too few 194. 203
6 216. 641 too few 202. 200.375
Merge date column and chicken data
date Whole B/S_Breast Bone-in_Breast Whole_Legs Thighs
1 Jan 2004 197.500 645.5 too few 193.500 too few
2 February 197.500 642.5 too few 193.500 203
3 March 209.000 642.5 too few 193.500 203
4 April 212.000 642.5 too few 193.500 203
5 May 214.500 642.5 too few 193.500 203
6 June 216.375 641.0 too few 201.875 200.375
date n
1 April 10
2 August 10
3 December 10
4 February 8
5 February /1 2
6 Jan 2004 1
7 Jan 2005 1
8 Jan 2006 1
9 Jan 2007 1
10 Jan 2008 1
11 Jan 2009 1
12 Jan 2010 1
13 Jan 2011 1
14 Jan 2012 1
15 Jan 2013 1
16 July 10
17 June 10
18 March 10
19 May 10
20 November 10
21 October 10
22 September 10
Same,we need to remove the “/1”
date Whole B/S_Breast Bone-in_Breast Whole_Legs Thighs
1 Jan 2004 197.500 645.5 too few 193.500 too few
2 February 197.500 642.5 too few 193.500 203
3 March 209.000 642.5 too few 193.500 203
4 April 212.000 642.5 too few 193.500 203
5 May 214.500 642.5 too few 193.500 203
6 June 216.375 641.0 too few 201.875 200.375
Then use separate() to split the date to month and year columns.We can use fill() to fill the blank year cell.
month year Whole B/S_Breast Bone-in_Breast Whole_Legs Thighs
1 Jan 2004 197.500 645.5 too few 193.500 too few
2 February 2004 197.500 642.5 too few 193.500 203
3 March 2004 209.000 642.5 too few 193.500 203
4 April 2004 212.000 642.5 too few 193.500 203
5 May 2004 214.500 642.5 too few 193.500 203
6 June 2004 216.375 641.0 too few 201.875 200.375
We can see there are 120 rows and 7 columns in this dataset,I’ll use 2 of variables to identify a case,so I’ll pivoting 7-2 variables into a longer format.Therefore,we would expect 120*(7-2)=600 rows in the pivoted dataframe.
As we can see,in the chicken data sheet,there are some string “too few” exist in 5 columns,so before we pivoting the data we should use a numeric type value replace them and convert all of the character type of number into numeric type value.
month year Whole B/S_Breast Bone-in_Breast Whole_Legs Thighs
1 Jan 2004 197.500 645.5 0 193.500 0.000
2 February 2004 197.500 642.5 0 193.500 203.000
3 March 2004 209.000 642.5 0 193.500 203.000
4 April 2004 212.000 642.5 0 193.500 203.000
5 May 2004 214.500 642.5 0 193.500 203.000
6 June 2004 216.375 641.0 0 201.875 200.375
Now all of the data are avaiable,we can use pivot_longer() function to make the dataset be longer.There are 7 variables in this dataset,but we need to use other 2 variavles to instead 5 of them.
# A tibble: 6 × 4
month year ChickenType Price
<chr> <chr> <chr> <dbl>
1 Jan 2004 Whole 198.
2 Jan 2004 B/S_Breast 646.
3 Jan 2004 Bone-in_Breast 0
4 Jan 2004 Whole_Legs 194.
5 Jan 2004 Thighs 0
6 February 2004 Whole 198.
Yes, once it is pivoted long, our resulting data are \(600x4\) - exactly what we expected!
---
title: "Challenge_3"
author: "XinyangMao"
description: "Tidy Data: Pivoting"
date: "03/17/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- aniceggpoultry.xls
```{r}
excel_sheets("_data/organiceggpoultry.xls")
```
### Import data
Use the date col in eggs form
```{r}
date<-read_excel("_data/organiceggpoultry.xls",
sheet="Data",
range =cell_limits(c(6,2),c(NA,2)),
col_names = c("date")
)
date
```
Import chicken data
```{r}
chicken_orig<-read_excel("_data/organiceggpoultry.xls",
sheet="Data",
range =cell_limits(c(6,8),c(NA,12)),
col_names = c("Whole",
"B/S_Breast", "Bone-in_Breast",
"Whole_Legs","Thighs")
)
head(chicken_orig)
```
Merge date column and chicken data
```{r}
chicken_merged <- cbind(date,chicken_orig)
head(chicken_merged)
```
```{r}
chicken_merged%>%
count(date)
```
Same,we need to remove the "/1"
```{r}
chicken<-chicken_merged%>%
mutate(date = str_remove(date, " /1"))
head(chicken)
```
Then use separate() to split the date to month and year columns.We can use fill() to fill the blank year cell.
```{r}
chicken<-chicken%>%
separate(date, into=c("month", "year"), sep=" ")%>%
fill(year)
head(chicken)
```
## find current and future data dimensions
We can see there are 120 rows and 7 columns in this dataset,I'll use 2 of variables to identify a case,so I'll pivoting 7-2 variables into a longer format.Therefore,we would expect 120*(7-2)=600 rows in the pivoted dataframe.
```{r}
#row number
nrow(chicken)
```
```{r}
#column number
ncol(chicken)
```
```{r}
#expect row number after pivoted
nrow(chicken) * (ncol(chicken)-2)
```
```{r}
#expect column number after pivoted
2 + 2
```
## Pivot the Chosen Data
### Converting data types
As we can see,in the chicken data sheet,there are some string "too few" exist in 5 columns,so before we pivoting the data we should use a numeric type value replace them and convert all of the character type of number into numeric type value.
```{r}
#| tbl-cap: Pivoted Example
chicken_db<-chicken
chicken_db <- chicken_db%>%
mutate(across(`Whole`:`Thighs`,~ifelse(. == "too few",0.0,as.numeric(.))))
head(chicken_db)
```
Now all of the data are avaiable,we can use pivot_longer() function to make the dataset be longer.There are 7 variables in this dataset,but we need to use other 2 variavles to instead 5 of them.
```{r}
chicken_long<-chicken_db%>%
pivot_longer(cols=c(`Whole`:`Thighs`),
names_to = "ChickenType",
values_to = "Price"
)
head(chicken_long)
```
Yes, once it is pivoted long, our resulting data are $600x4$ - exactly what we expected!