challenge_3
Tidy Data: Pivoting
Author

Cristhian Barba Garzon

Published

January 2, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Reading in data

“animal_weight.csv” was the chosen data set to read into R.

Code
weights = read_csv("_data/animal_weight.csv")
weights
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`

Briefly describe the data

The data is a tibble; it is a representation of the weights of different animals in different countries. It appears to be a comparison of the animals and their weights across different countries. When the data is loaded in, you can see that there are many different animals–about 17 different types–and there are about 9 different regions that the data was recorded from.

Anticipate the End Result

The end result of pivoting this data will be with new columns. There will be two new columns–types of animals and their respective weights. The original column “IPCC Area” will remain unchanged because that is not the focus in cleaning up this data set. The original columns of the animal types will be pivoted to become values under one column, and the respective weights will be pivoted under one column. This will reduce the amount of columns from 17 to 3, which will allow an audience to view and understand a cleaner data set. With this in mind, we can calculate what our expected rows and columns will be after pivoting our data.

Challenge: Describe the final dimensions

Computing and printing the original data set’s dimensions and the pivoted data set’s expected dimensions.

Code
#dimensions
weights %>% 
  dim() 
[1]  9 17
Code
#number of rows
rows = weights %>% 
  nrow()
#number of columns
columns = weights %>% 
  ncol()
#number of expected rows 
rows * (columns-1) #subtracting one column because the first column has the IPCC Areas as observations; it's 144
[1] 144
Code
#number of expected columns 
1 + 2 #one left over column + two additional columns going to be created in 
[1] 3
Code
#Considering I will be pivoting longer, the new columns will most likely be the left over Countries and the newly created columns--the animal types as the values and the animal weights as the values. 

Currently, the dimensions of the data set “animal_weights.csv” are 9 rows and 17 columns. When pivoted longer, the data will most likely have 144 rows and 3 columns

Challenge: Pivot the Chosen Data

Code
pivoted_weights = pivot_longer(weights, "Cattle - dairy":"Llamas", names_to = "Animal Types", values_to = "Weights")
pivoted_weights
# A tibble: 144 × 3
   `IPCC Area`         `Animal Types`     Weights
   <chr>               <chr>                <dbl>
 1 Indian Subcontinent Cattle - dairy       275  
 2 Indian Subcontinent Cattle - non-dairy   110  
 3 Indian Subcontinent Buffaloes            295  
 4 Indian Subcontinent Swine - market        28  
 5 Indian Subcontinent Swine - breeding      28  
 6 Indian Subcontinent Chicken - Broilers     0.9
 7 Indian Subcontinent Chicken - Layers       1.8
 8 Indian Subcontinent Ducks                  2.7
 9 Indian Subcontinent Turkeys                6.8
10 Indian Subcontinent Sheep                 28  
# … with 134 more rows
Code
dim(pivoted_weights) 
[1] 144   3

As expected, the dimensions for the data when pivoted longer are 144 rows by 3 columns.