Code
library(tidyverse)
library(skimr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Ananya Pujary
August 17, 2022
I’ll be reading in the ‘animal_weight’ dataset.
[1] 9 17
[1] "IPCC Area" "Cattle - dairy" "Cattle - non-dairy"
[4] "Buffaloes" "Swine - market" "Swine - breeding"
[7] "Chicken - Broilers" "Chicken - Layers" "Ducks"
[10] "Turkeys" "Sheep" "Goats"
[13] "Horses" "Asses" "Mules"
[16] "Camels" "Llamas"
The data chosen seems to describe the average weights of different animals (dairy and non-dairy cattle, chickens, ducks, etc.) across global regions (Africa, Latin America, Middle East, etc.). It has 9 rows and 17 columns, of which the \(n=9\) rows indicate the region name and \(k=16\) columns the type of animal.
Name | animal_weight |
Number of rows | 9 |
Number of columns | 17 |
_______________________ | |
Column type frequency: | |
character | 1 |
numeric | 16 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
IPCC Area | 0 | 1 | 4 | 19 | 0 | 9 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Cattle - dairy | 0 | 1 | 425.44 | 140.39 | 275.0 | 275.0 | 400.0 | 550.0 | 604.0 | ▇▅▁▂▇ |
Cattle - non-dairy | 0 | 1 | 298.00 | 116.26 | 110.0 | 173.0 | 330.0 | 391.0 | 420.0 | ▂▃▁▃▇ |
Buffaloes | 0 | 1 | 370.56 | 28.33 | 295.0 | 380.0 | 380.0 | 380.0 | 380.0 | ▁▁▁▁▇ |
Swine - market | 0 | 1 | 39.22 | 10.79 | 28.0 | 28.0 | 45.0 | 50.0 | 50.0 | ▇▁▁▂▇ |
Swine - breeding | 0 | 1 | 116.44 | 84.19 | 28.0 | 28.0 | 180.0 | 180.0 | 198.0 | ▆▁▁▁▇ |
Chicken - Broilers | 0 | 1 | 0.90 | 0.00 | 0.9 | 0.9 | 0.9 | 0.9 | 0.9 | ▁▁▇▁▁ |
Chicken - Layers | 0 | 1 | 1.80 | 0.00 | 1.8 | 1.8 | 1.8 | 1.8 | 1.8 | ▁▁▇▁▁ |
Ducks | 0 | 1 | 2.70 | 0.00 | 2.7 | 2.7 | 2.7 | 2.7 | 2.7 | ▁▁▇▁▁ |
Turkeys | 0 | 1 | 6.80 | 0.00 | 6.8 | 6.8 | 6.8 | 6.8 | 6.8 | ▁▁▇▁▁ |
Sheep | 0 | 1 | 39.39 | 10.80 | 28.0 | 28.0 | 48.5 | 48.5 | 48.5 | ▆▁▁▁▇ |
Goats | 0 | 1 | 34.72 | 4.48 | 30.0 | 30.0 | 38.5 | 38.5 | 38.5 | ▆▁▁▁▇ |
Horses | 0 | 1 | 315.22 | 73.26 | 238.0 | 238.0 | 377.0 | 377.0 | 377.0 | ▆▁▁▁▇ |
Asses | 0 | 1 | 130.00 | 0.00 | 130.0 | 130.0 | 130.0 | 130.0 | 130.0 | ▁▁▇▁▁ |
Mules | 0 | 1 | 130.00 | 0.00 | 130.0 | 130.0 | 130.0 | 130.0 | 130.0 | ▁▁▇▁▁ |
Camels | 0 | 1 | 217.00 | 0.00 | 217.0 | 217.0 | 217.0 | 217.0 | 217.0 | ▁▁▇▁▁ |
Llamas | 0 | 1 | 217.00 | 0.00 | 217.0 | 217.0 | 217.0 | 217.0 | 217.0 | ▁▁▇▁▁ |
There are no missing values in this dataset. Overall, dairy cattle seem to have the highest average weight (425.44) and broiler chickens have the lowest (0.9).
I plan to pivot it because it seems that the selected animals are recurring categories in all of the regions. \(k-3\) variables will be pivoted and put in a new column.
Hence, the pivoted dataset would have 144 rows and 3 columns (‘IPCC Area’,‘Animal Type’, ‘Weight’).
[1] 9
[1] 17
[1] 144
[1] 3
There are 9 existing rows and 17 existing columns. The expected rows are 144 and expected columns are 3.
animal_weight_pivoted <- pivot_longer(animal_weight,
col = c('Cattle - dairy', 'Cattle - non-dairy', 'Buffaloes', 'Swine - market', 'Swine - breeding', 'Chicken - Broilers', 'Chicken - Layers', 'Ducks', 'Turkeys', 'Sheep', 'Goats', 'Horses', 'Asses', 'Mules', 'Camels', 'Llamas'), names_to = 'Animal Type', values_to = 'Weight')
animal_weight_pivoted
# A tibble: 144 × 3
`IPCC Area` `Animal Type` Weight
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
# ℹ Use `print(n = ...)` to see more rows
[1] 144 3
The dimensions of the pivoted data, as predicted, are 144 rows and 3 columns. The new case created is ‘Animal Type’. Overall, pivoting made the data easier to understand since we can now find the weight of a certain animal from a particular region.
---
title: "Challenge 3"
author: "Ananya Pujary"
description: "Tidy Data: Pivoting"
date: "08/17/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weight
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(skimr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in data
I'll be reading in the 'animal_weight' dataset.
```{r}
#| label: reading in the data
animal_weight<-read_csv("_data/animal_weight.csv",
show_col_types = FALSE)
```
### Briefly describe the data
```{r}
#| label: data description 1
dim(animal_weight)
colnames(animal_weight)
```
The data chosen seems to describe the average weights of different animals (dairy and non-dairy cattle, chickens, ducks, etc.) across global regions (Africa, Latin America, Middle East, etc.). It has 9 rows and 17 columns, of which the $n=9$ rows indicate the region name and $k=16$ columns the type of animal.
```{r}
#| label: data description 2
skim(animal_weight)
```
There are no missing values in this dataset. Overall, dairy cattle seem to have the highest average weight (425.44) and broiler chickens have the lowest (0.9).
I plan to pivot it because it seems that the selected animals are recurring categories in all of the regions. $k-3$ variables will be pivoted and put in a new column.
Hence, the pivoted dataset would have `r 9*16` rows and 3 columns ('IPCC Area','Animal Type', 'Weight').
### Challenge: Describe the final dimensions
```{r}
#| label: describing final dimensions
# existing rows/cases
nrow(animal_weight)
# existing columns/cases
ncol(animal_weight)
#expected rows/cases
nrow(animal_weight) * (ncol(animal_weight)-1)
# expected columns
1 + 2
```
There are 9 existing rows and 17 existing columns. The expected rows are 144 and expected columns are 3.
### Challenge: Pivot the Chosen Data
```{r}
#| label: pivoting the data
animal_weight_pivoted <- pivot_longer(animal_weight,
col = c('Cattle - dairy', 'Cattle - non-dairy', 'Buffaloes', 'Swine - market', 'Swine - breeding', 'Chicken - Broilers', 'Chicken - Layers', 'Ducks', 'Turkeys', 'Sheep', 'Goats', 'Horses', 'Asses', 'Mules', 'Camels', 'Llamas'), names_to = 'Animal Type', values_to = 'Weight')
animal_weight_pivoted
dim(animal_weight_pivoted)
```
The dimensions of the pivoted data, as predicted, are 144 rows and 3 columns. The new case created is 'Animal Type'. Overall, pivoting made the data easier to understand since we can now find the weight of a certain animal from a particular region.