Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Paarth Tandon
January 2, 2023
IPCC Area | Cattle - dairy | Cattle - non-dairy | Buffaloes | Swine - market | Swine - breeding | Chicken - Broilers | Chicken - Layers | Ducks | Turkeys | Sheep | Goats | Horses | Asses | Mules | Camels | Llamas |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Indian Subcontinent | 275 | 110 | 295 | 28 | 28 | 0.9 | 1.8 | 2.7 | 6.8 | 28.0 | 30.0 | 238 | 130 | 130 | 217 | 217 |
Eastern Europe | 550 | 391 | 380 | 50 | 180 | 0.9 | 1.8 | 2.7 | 6.8 | 48.5 | 38.5 | 377 | 130 | 130 | 217 | 217 |
Africa | 275 | 173 | 380 | 28 | 28 | 0.9 | 1.8 | 2.7 | 6.8 | 28.0 | 30.0 | 238 | 130 | 130 | 217 | 217 |
Oceania | 500 | 330 | 380 | 45 | 180 | 0.9 | 1.8 | 2.7 | 6.8 | 48.5 | 38.5 | 377 | 130 | 130 | 217 | 217 |
Western Europe | 600 | 420 | 380 | 50 | 198 | 0.9 | 1.8 | 2.7 | 6.8 | 48.5 | 38.5 | 377 | 130 | 130 | 217 | 217 |
Latin America | 400 | 305 | 380 | 28 | 28 | 0.9 | 1.8 | 2.7 | 6.8 | 28.0 | 30.0 | 238 | 130 | 130 | 217 | 217 |
Asia | 350 | 391 | 380 | 50 | 180 | 0.9 | 1.8 | 2.7 | 6.8 | 48.5 | 38.5 | 377 | 130 | 130 | 217 | 217 |
Middle east | 275 | 173 | 380 | 28 | 28 | 0.9 | 1.8 | 2.7 | 6.8 | 28.0 | 30.0 | 238 | 130 | 130 | 217 | 217 |
Northern America | 604 | 389 | 380 | 46 | 198 | 0.9 | 1.8 | 2.7 | 6.8 | 48.5 | 38.5 | 377 | 130 | 130 | 217 | 217 |
[1] "IPCC Area" "Cattle - dairy" "Cattle - non-dairy"
[4] "Buffaloes" "Swine - market" "Swine - breeding"
[7] "Chicken - Broilers" "Chicken - Layers" "Ducks"
[10] "Turkeys" "Sheep" "Goats"
[13] "Horses" "Asses" "Mules"
[16] "Camels" "Llamas"
This csv file tracks the weights of various animals in different zones of the world. The zones included are the Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, Middle east, and Northern America. There are 16 different type of animals.
Currently each animal is represented as a column, which is not very tidy. I would like to pivot the dataframe so that there are only three columns: area, animal type, and weight. This means that I have to pivot each of the 16 animal columns.
In this case, \(n=9\) and \(k=17\). I will be using \(1\) of those variables to identify a case, so I will be pivoting \(17-1=16\) variables. The type of animal will go into the animal_type
column, and the weight will go into the weight
column. I would expect \(9*16=144\) rows in the pivoted dataframe. Since I would be converting those \(16\) columns into \(2\) columns, there would be \(3\) columns in the pivoted dataframe.
[1] 9
[1] 17
[1] 144
[1] 3
This pivot will make the data much easier to parse using R
, since each “case” described by a row is one weight, instead of a vector of weights. Yes, this increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.
Each “case” described by a row is one weight, instead of a vector of weights. This increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.
IPCC Area | animal_type | weight |
---|---|---|
Latin America | Cattle - non-dairy | 305.0 |
Latin America | Camels | 217.0 |
Western Europe | Cattle - non-dairy | 420.0 |
Northern America | Sheep | 48.5 |
Latin America | Goats | 30.0 |
Middle east | Swine - breeding | 28.0 |
Asia | Chicken - Layers | 1.8 |
Africa | Cattle - non-dairy | 173.0 |
Asia | Ducks | 2.7 |
Africa | Ducks | 2.7 |
[1] 144 3
As we can see in the code output (sample), our calculations were accurate.
---
title: "Challenge 3"
author: "Paarth Tandon"
description: "Tidy Data: Pivoting"
date: "01/02/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
df-print: kable
categories:
- challenge_3
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in data
```{r}
# read in the data using readr
animals <- read_csv("_data/animal_weight.csv")
# view a few data points
animals
# view all columns
colnames(animals)
```
### Briefly describe the data
This csv file tracks the weights of various animals in different zones of the world. The zones included are the Indian Subcontinent, Eastern Europe, Africa, Oceania, Western Europe, Latin America, Asia, Middle east, and Northern America. There are 16 different type of animals.
Currently each animal is represented as a column, which is not very tidy. I would like to pivot the dataframe so that there are only three columns: area, animal type, and weight. This means that I have to pivot each of the 16 animal columns.
## Anticipate the End Result
In this case, $n=9$ and $k=17$. I will be using $1$ of those variables to identify a case, so I will be pivoting $17-1=16$ variables. The type of animal will go into the `animal_type` column, and the weight will go into the `weight` column. I would expect $9*16=144$ rows in the pivoted dataframe. Since I would be converting those $16$ columns into $2$ columns, there would be $3$ columns in the pivoted dataframe.
### Challenge: Describe the final dimensions
```{r}
#existing rows/cases
nrow(animals)
#existing columns/cases
ncol(animals)
#expected rows/cases
nrow(animals) * (ncol(animals)-1)
# expected columns
1 + 2
```
This pivot will make the data much easier to parse using `R`, since each "case" described by a row is one weight, instead of a vector of weights. Yes, this increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.
## Pivot the Data
### Challenge: Pivot the Chosen Data
Each "case" described by a row is one weight, instead of a vector of weights. This increases the number of samples, but this level of granularity will make calculating statistics about the weights themselves much easier.
```{r}
#-1 removes the columns we want to keep
cols <- colnames(animals)[-1]
animals_pivoted<-pivot_longer(animals, col = cols,
names_to="animal_type",
values_to = "weight")
animals_pivoted[sample(nrow(animals_pivoted), 10), ]
dim(animals_pivoted)
```
As we can see in the code output (sample), our calculations were accurate.