Challenge 3

challenge_3

shantanu

animal_weights

Tidy Data: Pivoting

Author

Shantanu Patil

Published

March 26, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Briefly describe the data

This csv file keeps track of the weights of various animals in various parts of the world. The Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, the Middle East, and Northern America are all included. There are sixteen different kinds of animals.

Each animal is currently represented as a column, which is not very neat. I’d like to reduce the number of columns in the dataframe to three: area, animal type, and weight. This requires me to pivot each of the 16 animal columns.

Anticipate the End Result

\(n=9\) and \(k=17\) in this case. I’ll be pivoting \(17-1=16\) variables because I’ll be using \(1\) of those variables to identify a case. The animal’s type will be entered into the ‘animal type’ column, and its weight will be entered into the ‘weight’ column. In the pivoted dataframe, I would anticipate \(9*16=144\) rows. Because I’d be converting those \(16\) columns to \(2\) columns, the pivoted dataframe would have \(3\) columns.

Reading the Data

Code

## Read in data
animals <- read_csv("_data/animal_weight.csv")

# View a few data points
head(animals)

# A tibble: 6 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
#   ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
#   ⁸`Chicken - Layers`

Code

# View all columns
colnames(animals)

 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"

Challenge: Calcuating the dimensions

Code

# Calculate existing rows, existing columns, expected rows, and expected columns
n <- nrow(animals)
k <- ncol(animals)
expected_rows <- n * (k - 1)
expected_cols <- 1 + 2

Challenge: Pivot the Chosen Data

Instead of a vector of weights, each “case” described by a row is one weight. The number of samples increases, but the level of granularity makes calculating statistics about the weights themselves much easier.

Code

# Define columns to be pivoted
cols <- colnames(animals)[-1]

# Pivot the data using pivot_longer()
animals_pivoted <- pivot_longer(animals, cols = cols,
                                 names_to = "animal_type",
                                 values_to = "weight")

# Display a sample of the pivoted data
animals_pivoted[sample(nrow(animals_pivoted), 10), ]

# A tibble: 10 × 3
   `IPCC Area`   animal_type        weight
   <chr>         <chr>               <dbl>
 1 Asia          Ducks                 2.7
 2 Oceania       Camels              217  
 3 Latin America Cattle - non-dairy  305  
 4 Asia          Cattle - non-dairy  391  
 5 Middle east   Turkeys               6.8
 6 Latin America Chicken - Layers      1.8
 7 Middle east   Camels              217  
 8 Africa        Sheep                28  
 9 Asia          Asses               130  
10 Latin America Buffaloes           380

Code

# Display the dimensions of the pivoted data
dim(animals_pivoted)

[1] 144   3

Our calculations were correct, as evidenced by the code output.