Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Shantanu Patil
March 26, 2023
This csv file keeps track of the weights of various animals in various parts of the world. The Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, the Middle East, and Northern America are all included. There are sixteen different kinds of animals.
Each animal is currently represented as a column, which is not very neat. I’d like to reduce the number of columns in the dataframe to three: area, animal type, and weight. This requires me to pivot each of the 16 animal columns.
\(n=9\) and \(k=17\) in this case. I’ll be pivoting \(17-1=16\) variables because I’ll be using \(1\) of those variables to identify a case. The animal’s type will be entered into the ‘animal type’ column, and its weight will be entered into the ‘weight’ column. In the pivoted dataframe, I would anticipate \(9*16=144\) rows. Because I’d be converting those \(16\) columns to \(2\) columns, the pivoted dataframe would have \(3\) columns.
# A tibble: 6 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
[1] "IPCC Area" "Cattle - dairy" "Cattle - non-dairy"
[4] "Buffaloes" "Swine - market" "Swine - breeding"
[7] "Chicken - Broilers" "Chicken - Layers" "Ducks"
[10] "Turkeys" "Sheep" "Goats"
[13] "Horses" "Asses" "Mules"
[16] "Camels" "Llamas"
Instead of a vector of weights, each “case” described by a row is one weight. The number of samples increases, but the level of granularity makes calculating statistics about the weights themselves much easier.
# A tibble: 10 × 3
`IPCC Area` animal_type weight
<chr> <chr> <dbl>
1 Asia Ducks 2.7
2 Oceania Camels 217
3 Latin America Cattle - non-dairy 305
4 Asia Cattle - non-dairy 391
5 Middle east Turkeys 6.8
6 Latin America Chicken - Layers 1.8
7 Middle east Camels 217
8 Africa Sheep 28
9 Asia Asses 130
10 Latin America Buffaloes 380
[1] 144 3
Our calculations were correct, as evidenced by the code output.
---
title: "Challenge 3"
author: "Shantanu Patil"
description: "Tidy Data: Pivoting"
date: "03/26/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- shantanu
- animal_weights
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
### Briefly describe the data
This csv file keeps track of the weights of various animals in various parts of the world. The Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, the Middle East, and Northern America are all included. There are sixteen different kinds of animals.
Each animal is currently represented as a column, which is not very neat. I'd like to reduce the number of columns in the dataframe to three: area, animal type, and weight. This requires me to pivot each of the 16 animal columns.
## Anticipate the End Result
$n=9$ and $k=17$ in this case. I'll be pivoting $17-1=16$ variables because I'll be using $1$ of those variables to identify a case. The animal's type will be entered into the 'animal type' column, and its weight will be entered into the 'weight' column. In the pivoted dataframe, I would anticipate $9*16=144$ rows. Because I'd be converting those $16$ columns to $2$ columns, the pivoted dataframe would have $3$ columns.
### Reading the Data
```{r}
## Read in data
animals <- read_csv("_data/animal_weight.csv")
# View a few data points
head(animals)
# View all columns
colnames(animals)
```
### Challenge: Calcuating the dimensions
```{r}
# Calculate existing rows, existing columns, expected rows, and expected columns
n <- nrow(animals)
k <- ncol(animals)
expected_rows <- n * (k - 1)
expected_cols <- 1 + 2
```
### Challenge: Pivot the Chosen Data
Instead of a vector of weights, each "case" described by a row is one weight. The number of samples increases, but the level of granularity makes calculating statistics about the weights themselves much easier.
```{r}
# Define columns to be pivoted
cols <- colnames(animals)[-1]
# Pivot the data using pivot_longer()
animals_pivoted <- pivot_longer(animals, cols = cols,
names_to = "animal_type",
values_to = "weight")
# Display a sample of the pivoted data
animals_pivoted[sample(nrow(animals_pivoted), 10), ]
# Display the dimensions of the pivoted data
dim(animals_pivoted)
```
Our calculations were correct, as evidenced by the code output.