challenge_2
Matt Eckstein
animal_weight.csv
Author

Matt Eckstein

Published

March 17, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Read in data

Code
animals <- read.csv("_data/animal_weight.csv")

head(animals)
            IPCC.Area Cattle...dairy Cattle...non.dairy Buffaloes
1 Indian Subcontinent            275                110       295
2      Eastern Europe            550                391       380
3              Africa            275                173       380
4             Oceania            500                330       380
5      Western Europe            600                420       380
6       Latin America            400                305       380
  Swine...market Swine...breeding Chicken...Broilers Chicken...Layers Ducks
1             28               28                0.9              1.8   2.7
2             50              180                0.9              1.8   2.7
3             28               28                0.9              1.8   2.7
4             45              180                0.9              1.8   2.7
5             50              198                0.9              1.8   2.7
6             28               28                0.9              1.8   2.7
  Turkeys Sheep Goats Horses Asses Mules Camels Llamas
1     6.8  28.0  30.0    238   130   130    217    217
2     6.8  48.5  38.5    377   130   130    217    217
3     6.8  28.0  30.0    238   130   130    217    217
4     6.8  48.5  38.5    377   130   130    217    217
5     6.8  48.5  38.5    377   130   130    217    217
6     6.8  28.0  30.0    238   130   130    217    217
Code
summarize(animals)
data frame with 0 columns and 1 row

Briefly describe the data and Anticipate the End Result

This data describes the average weights of common types of livestock across regions of the world. Its 17 columns make it somewhat difficult to read, and it could be more legible if it were grouped with only 3 columns and location-livestock type pairs as cases.

Find current and future data dimensions

Code
nrow(animals)
[1] 9
Code
ncol(animals)
[1] 17

There are 17 columns, 16 of which are animals (variables) and not the column containing the names of the observations.

There are 9 observations of 17 variables. I need 1 variable to identify a case, and there will be n * (k - number of variables used to identify a case) rows in the result. 9 * (17-1) = 144. So, we expect the result of our pivoting to have 144 rows.

Code
nrow(animals) * (ncol(animals)-1)
[1] 144

Pivot the Data

Code
animals2 <- pivot_longer(animals, `Cattle...dairy`:`Llamas`, names_to = "type", values_to = "weights")

Describe the final dimensions

Code
nrow(animals2)
[1] 144
Code
ncol(animals2)
[1] 3

The final table does in fact have 144 rows and 3 columns.

New cases and what makes the new data tidy

Now that the data is pivoted, a case is a pairing of an IPCC area and an animal type. This data is tidy because every variable (IPCC area, animal type, and weight) is a column, and every observation (an area-type pairing) is a row.