challenge_3
Tidy Data: Pivoting
Author

Paarth Tandon

Published

January 2, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code
# read in the data using readr
animals <- read_csv("_data/animal_weight.csv")
# view a few data points
animals
IPCC Area Cattle - dairy Cattle - non-dairy Buffaloes Swine - market Swine - breeding Chicken - Broilers Chicken - Layers Ducks Turkeys Sheep Goats Horses Asses Mules Camels Llamas
Indian Subcontinent 275 110 295 28 28 0.9 1.8 2.7 6.8 28.0 30.0 238 130 130 217 217
Eastern Europe 550 391 380 50 180 0.9 1.8 2.7 6.8 48.5 38.5 377 130 130 217 217
Africa 275 173 380 28 28 0.9 1.8 2.7 6.8 28.0 30.0 238 130 130 217 217
Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8 48.5 38.5 377 130 130 217 217
Western Europe 600 420 380 50 198 0.9 1.8 2.7 6.8 48.5 38.5 377 130 130 217 217
Latin America 400 305 380 28 28 0.9 1.8 2.7 6.8 28.0 30.0 238 130 130 217 217
Asia 350 391 380 50 180 0.9 1.8 2.7 6.8 48.5 38.5 377 130 130 217 217
Middle east 275 173 380 28 28 0.9 1.8 2.7 6.8 28.0 30.0 238 130 130 217 217
Northern America 604 389 380 46 198 0.9 1.8 2.7 6.8 48.5 38.5 377 130 130 217 217
Code
# view all columns
colnames(animals)
 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"            

Briefly describe the data

This csv file tracks the weights of various animals in different zones of the world. The zones included are the Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, Middle east, and Northern America. There are 16 different type of animals.

Currently each animal is represented as a column, which is not very tidy. I would like to pivot the dataframe so that there are only three columns: area, animal type, and weight. This means that I have to pivot each of the 16 animal columns.

Anticipate the End Result

In this case, \(n=9\) and \(k=17\). I will be using \(1\) of those variables to identify a case, so I will be pivoting \(17-1=16\) variables. The type of animal will go into the animal_type column, and the weight will go into the weight column. I would expect \(9*16=144\) rows in the pivoted dataframe. Since I would be converting those \(16\) columns into \(2\) columns, there would be \(3\) columns in the pivoted dataframe.

Challenge: Describe the final dimensions

Code
#existing rows/cases
nrow(animals)
[1] 9
Code
#existing columns/cases
ncol(animals)
[1] 17
Code
#expected rows/cases
nrow(animals) * (ncol(animals)-1)
[1] 144
Code
# expected columns 
1 + 2
[1] 3

This pivot will make the data much easier to parse using R, since each “case” described by a row is one weight, instead of a vector of weights. Yes, this increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.

Pivot the Data

Challenge: Pivot the Chosen Data

Each “case” described by a row is one weight, instead of a vector of weights. This increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.

Code
#-1 removes the columns we want to keep
cols <- colnames(animals)[-1]

animals_pivoted<-pivot_longer(animals, col = cols,
                names_to="animal_type",
                values_to = "weight")

animals_pivoted[sample(nrow(animals_pivoted), 10), ]
IPCC Area animal_type weight
Latin America Cattle - non-dairy 305.0
Latin America Camels 217.0
Western Europe Cattle - non-dairy 420.0
Northern America Sheep 48.5
Latin America Goats 30.0
Middle east Swine - breeding 28.0
Asia Chicken - Layers 1.8
Africa Cattle - non-dairy 173.0
Asia Ducks 2.7
Africa Ducks 2.7
Code
dim(animals_pivoted)
[1] 144   3

As we can see in the code output (sample), our calculations were accurate.