Challenge 3

challenge_3

Tidy Data: Pivoting

Author

Paarth Tandon

Published

January 2, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code

# read in the data using readr
animals <- read_csv("_data/animal_weight.csv")
# view a few data points
animals

IPCC Area	Cattle - dairy	Cattle - non-dairy	Buffaloes	Swine - market	Swine - breeding	Chicken - Broilers	Chicken - Layers	Ducks	Turkeys	Sheep	Goats	Horses	Asses	Mules	Camels	Llamas
Indian Subcontinent	275	110	295	28	28	0.9	1.8	2.7	6.8	28.0	30.0	238	130	130	217	217
Eastern Europe	550	391	380	50	180	0.9	1.8	2.7	6.8	48.5	38.5	377	130	130	217	217
Africa	275	173	380	28	28	0.9	1.8	2.7	6.8	28.0	30.0	238	130	130	217	217
Oceania	500	330	380	45	180	0.9	1.8	2.7	6.8	48.5	38.5	377	130	130	217	217
Western Europe	600	420	380	50	198	0.9	1.8	2.7	6.8	48.5	38.5	377	130	130	217	217
Latin America	400	305	380	28	28	0.9	1.8	2.7	6.8	28.0	30.0	238	130	130	217	217
Asia	350	391	380	50	180	0.9	1.8	2.7	6.8	48.5	38.5	377	130	130	217	217
Middle east	275	173	380	28	28	0.9	1.8	2.7	6.8	28.0	30.0	238	130	130	217	217
Northern America	604	389	380	46	198	0.9	1.8	2.7	6.8	48.5	38.5	377	130	130	217	217

Code

# view all columns
colnames(animals)

 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"

Briefly describe the data

This csv file tracks the weights of various animals in different zones of the world. The zones included are the Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, Middle east, and Northern America. There are 16 different type of animals.

Currently each animal is represented as a column, which is not very tidy. I would like to pivot the dataframe so that there are only three columns: area, animal type, and weight. This means that I have to pivot each of the 16 animal columns.

Anticipate the End Result

In this case, \(n=9\) and \(k=17\). I will be using \(1\) of those variables to identify a case, so I will be pivoting \(17-1=16\) variables. The type of animal will go into the animal_type column, and the weight will go into the weight column. I would expect \(9*16=144\) rows in the pivoted dataframe. Since I would be converting those \(16\) columns into \(2\) columns, there would be \(3\) columns in the pivoted dataframe.

Challenge: Describe the final dimensions

Code

#existing rows/cases
nrow(animals)

[1] 9

Code

#existing columns/cases
ncol(animals)

[1] 17

Code

#expected rows/cases
nrow(animals) * (ncol(animals)-1)

[1] 144

Code

# expected columns 
1 + 2

[1] 3

This pivot will make the data much easier to parse using R, since each “case” described by a row is one weight, instead of a vector of weights. Yes, this increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.

Pivot the Data

Challenge: Pivot the Chosen Data

Each “case” described by a row is one weight, instead of a vector of weights. This increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.

Code

#-1 removes the columns we want to keep
cols <- colnames(animals)[-1]

animals_pivoted<-pivot_longer(animals, col = cols,
                names_to="animal_type",
                values_to = "weight")

animals_pivoted[sample(nrow(animals_pivoted), 10), ]

IPCC Area	animal_type	weight
Latin America	Cattle - non-dairy	305.0
Latin America	Camels	217.0
Western Europe	Cattle - non-dairy	420.0
Northern America	Sheep	48.5
Latin America	Goats	30.0
Middle east	Swine - breeding	28.0
Asia	Chicken - Layers	1.8
Africa	Cattle - non-dairy	173.0
Asia	Ducks	2.7
Africa	Ducks	2.7

Code

dim(animals_pivoted)

[1] 144   3

As we can see in the code output (sample), our calculations were accurate.