Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Matthew Weiner
March 24, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
For this challenge, I chose to investigate the animal_weights
dataset.
First I had to import the necesary package in order to read the file. Then, in order to get an idea of what the data contained, I used the head()
command to output the first few rows of the dataset and used the colnnames()
command to view all column names.
# A tibble: 6 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
[1] "IPCC Area" "Cattle - dairy" "Cattle - non-dairy"
[4] "Buffaloes" "Swine - market" "Swine - breeding"
[7] "Chicken - Broilers" "Chicken - Layers" "Ducks"
[10] "Turkeys" "Sheep" "Goats"
[13] "Horses" "Asses" "Mules"
[16] "Camels" "Llamas"
This preliminary investigation into the dataset has already given me a good idea of the type of data that is involved. Due to the title of the dataset, and the fact that all columns except for the first are of the type dbl
, tells me that this dataet describes the average weight of each type of animal for each IPCC area.
After viewing the column names, I could see that we will want to use the pivot_longer()
command here in order to tidy up the data. The original dataset is in a wide format, where each column represents a variable, and the animal types are spread out across multiple columns. This makes it difficult to perform certain types of data analysis and visualization tasks because the data is not organized in a way that is easy to work with.
By using pivot_longer()
we can mitigate this issue by converting the dataset to a long format where each row will be a combination of the IPCC Area
, the Animal
type, and the Value
column which represents the average weight of the animals.
The original shape of the dataset is (9,17) where there are 9 rows and 17 columns. Our goal is to change this such that we will pivot all animal columns into a single column. This means all columns but IPCC Area
will be pivoted. If we let k be the number of columns, then we want to pivot k-1 columns which is equivalent to k-1 = 16.
[1] 9
[1] 17
[1] 144
[1] 3
As we can see above, that means that the resulting shape of the pivoted dataset will be (144,3), where there are 144 rows and 3 columns.
# A tibble: 144 × 3
`IPCC Area` Animal Value
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
[1] 144
[1] 3
After applying the pivot_longer()
command, we can see that our hypothesis was right as there are 144 rows in this new dataset with 3 columns. The dataset is now much more readable and can more easily be used in the future for other investigative tasks.
---
title: "Challenge 3"
author: "Matthew Weiner"
description: "Tidy Data: Pivoting"
date: "03/24/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- Matthew_Weiner
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
## Introduction
For this challenge, I chose to investigate the `animal_weights` dataset.
## Getting Started
First I had to import the necesary package in order to read the file. Then, in order to get an idea of what the data contained, I used the `head()` command to output the first few rows of the dataset and used the `colnnames()` command to view all column names.
```{r}
library(readr)
aw <- read_csv("_data/animal_weight.csv")
head(aw)
colnames(aw)
```
This preliminary investigation into the dataset has already given me a good idea of the type of data that is involved. Due to the title of the dataset, and the fact that all columns except for the first are of the type `dbl`, tells me that this dataet describes the average weight of each type of animal for each IPCC area.
After viewing the column names, I could see that we will want to use the `pivot_longer()` command here in order to tidy up the data. The original dataset is in a wide format, where each column represents a variable, and the animal types are spread out across multiple columns. This makes it difficult to perform certain types of data analysis and visualization tasks because the data is not organized in a way that is easy to work with.
By using `pivot_longer()` we can mitigate this issue by converting the dataset to a long format where each row will be a combination of the `IPCC Area`, the `Animal` type, and the `Value` column which represents the average weight of the animals.
## Pivoting Math
The original shape of the dataset is (9,17) where there are 9 rows and 17 columns. Our goal is to change this such that we will pivot all animal columns into a single column. This means all columns but `IPCC Area` will be pivoted.
If we let k be the number of columns, then we want to pivot k-1 columns which is equivalent to k-1 = 16.
```{r}
#existing rows
nrow(aw)
#existing cols
ncol(aw)
#expected rows
nrow(aw) * (ncol(aw)-1)
#expected col
1+2
```
As we can see above, that means that the resulting shape of the pivoted dataset will be (144,3), where there are 144 rows and 3 columns.
## Pivot the Data
```{r}
pivot_aw <- pivot_longer(aw,cols = -`IPCC Area`, names_to = "Animal", values_to = "Value")
pivot_aw
nrow(pivot_aw)
ncol(pivot_aw)
```
After applying the `pivot_longer()` command, we can see that our hypothesis was right as there are 144 rows in this new dataset with 3 columns. The dataset is now much more readable and can more easily be used in the future for other investigative tasks.