Challenge 3

Tidy Data: Pivoting

Author

Siddharth Goel

Published

January 27, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Code

animal_weights_df <- read_csv("_data/animal_weight.csv")

animal_weights_df

# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
#   ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
#   ⁸`Chicken - Layers`

Code

spec(animal_weights_df)

cols(
  `IPCC Area` = col_character(),
  `Cattle - dairy` = col_double(),
  `Cattle - non-dairy` = col_double(),
  Buffaloes = col_double(),
  `Swine - market` = col_double(),
  `Swine - breeding` = col_double(),
  `Chicken - Broilers` = col_double(),
  `Chicken - Layers` = col_double(),
  Ducks = col_double(),
  Turkeys = col_double(),
  Sheep = col_double(),
  Goats = col_double(),
  Horses = col_double(),
  Asses = col_double(),
  Mules = col_double(),
  Camels = col_double(),
  Llamas = col_double()
)

Briefly describe the data

The input dataset describes the weights of different animals and the overall category-wise weights for 9 IPCC areas. For this dataset, I plan to move the animals to rows and the areas to columns using pivot_longer so that it is easy to get statistics based on specific animals and categories.

Challenge: Describe the final dimensions

Document your work here.

Code

# Existing Rows
expected_rows <- nrow(animal_weights_df) # 9

# Existing Columns
expected_cols <- ncol(animal_weights_df) # 17

# Expected Rows
nrow(animal_weights_df) * (expected_cols - 1) # -1 for the header

[1] 144

Code

# expected columns 
3 # IPCC Area, Animal, Weight

[1] 3

The dataset currently has 9 rows and 17 columns. When we pivot the dataset, the targeted rows and columns will be 144 rows and 3 columns (Area, Animal and Weight).

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Code

pivoted_df <- pivot_longer(animal_weights_df, 2:ncol(animal_weights_df), names_to = "Animal", values_to = "Weight")
head(pivoted_df)

# A tibble: 6 × 3
  `IPCC Area`         Animal             Weight
  <chr>               <chr>               <dbl>
1 Indian Subcontinent Cattle - dairy      275  
2 Indian Subcontinent Cattle - non-dairy  110  
3 Indian Subcontinent Buffaloes           295  
4 Indian Subcontinent Swine - market       28  
5 Indian Subcontinent Swine - breeding     28  
6 Indian Subcontinent Chicken - Broilers    0.9

Yes, once it is pivoted long, our resulting data are $144 x 3$ - exactly what we expected!