Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Sai Venkatesh
April 25, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
I will be using the animal_weights.csv dataset file.
# A tibble: 9 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
7 Asia 350 391 380 50 180 0.9 1.8 2.7 6.8
8 Middle … 275 173 380 28 28 0.9 1.8 2.7 6.8
9 Norther… 604 389 380 46 198 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”
[1] 9 17
[1] "IPCC Area" "Cattle - dairy" "Cattle - non-dairy"
[4] "Buffaloes" "Swine - market" "Swine - breeding"
[7] "Chicken - Broilers" "Chicken - Layers" "Ducks"
[10] "Turkeys" "Sheep" "Goats"
[13] "Horses" "Asses" "Mules"
[16] "Camels" "Llamas"
There are 9 X 17 dimensions. There are 9 regions and 16 animal varieties. Since for each region there are 16 varieties, we can use tidy to make sure that for each region and animal variety, we get a displayed value.
We will use pivot_longer where we will ensure that for each region and variety, there is a single value displayed. The name will be variety and the value will be the weight. Since each region is now seperated, there will be more rows as there will now be 16 rows against each region. This will bring it to a total of 9 * 16 = 144 rows and there will be 3 columns (IPCC, variety, weight ) .
# A tibble: 144 × 3
`IPCC Area` Variety Weight
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
Yes, once it is pivoted long, our resulting data are \(9 X 16\) = 144 rows and there are 3 columns which is exactly what we expected!
---
title: "Challenge 3"
author: "Sai Venkatesh"
description: "Tidy Data: Pivoting"
date: "04/25/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
I will be using the animal_weights.csv dataset file.
```{r}
animal_weights_data <- read_csv("_data/animal_weight.csv", show_col_types = FALSE)
animal_weights_data
```
### Briefly describe the data
Describe the data, and be sure to comment on why you are planning to pivot it to make it "tidy"
```{r}
# The Dimensions
dim(animal_weights_data)
# The Column Names
colnames(animal_weights_data)
```
There are 9 X 17 dimensions. There are 9 regions and 16 animal varieties. Since for each region there are 16 varieties, we can use tidy to make sure that for each region and animal variety, we get a displayed value.
## Anticipate the End Result
We will use pivot_longer where we will ensure that for each region and variety, there is a single value displayed. The name will be variety and the value will be the weight. Since each region is now seperated, there will be more rows as there will now be 16 rows against each region. This will bring it to a total of 9 * 16 = 144 rows and there will be 3 columns (IPCC, variety, weight ) .
## Pivot the Data
```{r}
#| tbl-cap: Pivoted Example
animal_weight_longer<-pivot_longer(animal_weights_data,
col = c(2:17),
names_to = "Variety",
values_to = "Weight")
animal_weight_longer
```
Yes, once it is pivoted long, our resulting data are $9 X 16$ = 144 rows and there are 3 columns which is exactly what we expected!