Challenge 3

challenge_3

animal_weight

Tidy Data: Pivoting

Author

Ananya Pujary

Published

August 17, 2022

Code

library(tidyverse)

library(skimr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

I’ll be reading in the ‘animal_weight’ dataset.

Code

animal_weight<-read_csv("_data/animal_weight.csv",
                        show_col_types = FALSE)

Briefly describe the data

Code

dim(animal_weight)

[1]  9 17

Code

colnames(animal_weight)

 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"

The data chosen seems to describe the average weights of different animals (dairy and non-dairy cattle, chickens, ducks, etc.) across global regions (Africa, Latin America, Middle East, etc.). It has 9 rows and 17 columns, of which the \(n=9\) rows indicate the region name and \(k=16\) columns the type of animal.

Code

skim(animal_weight)

Data summary
Name	animal_weight
Number of rows	9
Number of columns	17
_______________________
Column type frequency:
character	1
numeric	16
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
IPCC Area	0	1	4	19	0	9	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Cattle - dairy	1	425.44	140.39	275.0	275.0	400.0	550.0	604.0	▇▅▁▂▇
Cattle - non-dairy	1	298.00	116.26	110.0	173.0	330.0	391.0	420.0	▂▃▁▃▇
Buffaloes	1	370.56	28.33	295.0	380.0	380.0	380.0	380.0	▁▁▁▁▇
Swine - market	1	39.22	10.79	28.0	28.0	45.0	50.0	50.0	▇▁▁▂▇
Swine - breeding	1	116.44	84.19	28.0	28.0	180.0	180.0	198.0	▆▁▁▁▇
Chicken - Broilers	1	0.90	0.00	0.9	0.9	0.9	0.9	0.9	▁▁▇▁▁
Chicken - Layers	1	1.80	0.00	1.8	1.8	1.8	1.8	1.8	▁▁▇▁▁
Ducks	1	2.70	0.00	2.7	2.7	2.7	2.7	2.7	▁▁▇▁▁
Turkeys	1	6.80	0.00	6.8	6.8	6.8	6.8	6.8	▁▁▇▁▁
Sheep	1	39.39	10.80	28.0	28.0	48.5	48.5	48.5	▆▁▁▁▇
Goats	1	34.72	4.48	30.0	30.0	38.5	38.5	38.5	▆▁▁▁▇
Horses	1	315.22	73.26	238.0	238.0	377.0	377.0	377.0	▆▁▁▁▇
Asses	1	130.00	0.00	130.0	130.0	130.0	130.0	130.0	▁▁▇▁▁
Mules	1	130.00	0.00	130.0	130.0	130.0	130.0	130.0	▁▁▇▁▁
Camels	1	217.00	0.00	217.0	217.0	217.0	217.0	217.0	▁▁▇▁▁
Llamas	1	217.00	0.00	217.0	217.0	217.0	217.0	217.0	▁▁▇▁▁

There are no missing values in this dataset. Overall, dairy cattle seem to have the highest average weight (425.44) and broiler chickens have the lowest (0.9).

I plan to pivot it because it seems that the selected animals are recurring categories in all of the regions. \(k-3\) variables will be pivoted and put in a new column.

Hence, the pivoted dataset would have 144 rows and 3 columns (‘IPCC Area’,‘Animal Type’, ‘Weight’).

Challenge: Describe the final dimensions

Code

# existing rows/cases
nrow(animal_weight)

[1] 9

Code

# existing columns/cases
ncol(animal_weight)

[1] 17

Code

#expected rows/cases
nrow(animal_weight) * (ncol(animal_weight)-1)

[1] 144

Code

# expected columns 
1 + 2

[1] 3

There are 9 existing rows and 17 existing columns. The expected rows are 144 and expected columns are 3.

Challenge: Pivot the Chosen Data

Code

animal_weight_pivoted <- pivot_longer(animal_weight,
                         col = c('Cattle - dairy', 'Cattle - non-dairy', 'Buffaloes', 'Swine - market', 'Swine - breeding', 'Chicken - Broilers', 'Chicken - Layers', 'Ducks', 'Turkeys', 'Sheep', 'Goats', 'Horses', 'Asses', 'Mules', 'Camels', 'Llamas'), names_to = 'Animal Type', values_to = 'Weight')
animal_weight_pivoted

# A tibble: 144 × 3
   `IPCC Area`         `Animal Type`      Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows
# ℹ Use `print(n = ...)` to see more rows

Code

dim(animal_weight_pivoted)

[1] 144   3

The dimensions of the pivoted data, as predicted, are 144 rows and 3 columns. The new case created is ‘Animal Type’. Overall, pivoting made the data easier to understand since we can now find the weight of a certain animal from a particular region.