Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Abby Balint
September 27, 2022
I read in the “animal_weight” data set, and renamed it “weights” for easier coding. Then below that I found the summary to get a high level overview of the data (not that it is needed really here since there are only 9 rows originally)
# A tibble: 9 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
7 Asia 350 391 380 50 180 0.9 1.8 2.7 6.8
8 Middle … 275 173 380 28 28 0.9 1.8 2.7 6.8
9 Norther… 604 389 380 46 198 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
IPCC Area Cattle - dairy Cattle - non-dairy Buffaloes
Length:9 Min. :275.0 Min. :110 Min. :295.0
Class :character 1st Qu.:275.0 1st Qu.:173 1st Qu.:380.0
Mode :character Median :400.0 Median :330 Median :380.0
Mean :425.4 Mean :298 Mean :370.6
3rd Qu.:550.0 3rd Qu.:391 3rd Qu.:380.0
Max. :604.0 Max. :420 Max. :380.0
Swine - market Swine - breeding Chicken - Broilers Chicken - Layers
Min. :28.00 Min. : 28.0 Min. :0.9 Min. :1.8
1st Qu.:28.00 1st Qu.: 28.0 1st Qu.:0.9 1st Qu.:1.8
Median :45.00 Median :180.0 Median :0.9 Median :1.8
Mean :39.22 Mean :116.4 Mean :0.9 Mean :1.8
3rd Qu.:50.00 3rd Qu.:180.0 3rd Qu.:0.9 3rd Qu.:1.8
Max. :50.00 Max. :198.0 Max. :0.9 Max. :1.8
Ducks Turkeys Sheep Goats Horses
Min. :2.7 Min. :6.8 Min. :28.00 Min. :30.00 Min. :238.0
1st Qu.:2.7 1st Qu.:6.8 1st Qu.:28.00 1st Qu.:30.00 1st Qu.:238.0
Median :2.7 Median :6.8 Median :48.50 Median :38.50 Median :377.0
Mean :2.7 Mean :6.8 Mean :39.39 Mean :34.72 Mean :315.2
3rd Qu.:2.7 3rd Qu.:6.8 3rd Qu.:48.50 3rd Qu.:38.50 3rd Qu.:377.0
Max. :2.7 Max. :6.8 Max. :48.50 Max. :38.50 Max. :377.0
Asses Mules Camels Llamas
Min. :130 Min. :130 Min. :217 Min. :217
1st Qu.:130 1st Qu.:130 1st Qu.:217 1st Qu.:217
Median :130 Median :130 Median :217 Median :217
Mean :130 Mean :130 Mean :217 Mean :217
3rd Qu.:130 3rd Qu.:130 3rd Qu.:217 3rd Qu.:217
Max. :130 Max. :130 Max. :217 Max. :217
This dataset contains 17 variables and 9 rows of data related to animal weights by animal as well as region of the world. The reason that pivoting will be helping in visualizing the data here is because in the current format, we cannot filter by animal because each animal is its own variable. Pivoting the data will allow us to filter the data set easily based on animal to find average weights and filter by both animal type and region of the world.
To find the below final dimensions, I used the same formula as the example but used the animal weights data. My original data set started with 9 rows and 17 variables. Only one of the original variables will remain a variable. The 16 variables I am pivoting will turn into two variables - animal (names), and weights (values). My row numbers will now be 144 because I will have the 9 rows times the 16 variables I am transforming. I should end up with 3 columns, my one original variable and my 2 new variables.
[1] 9
[1] 17
[1] 144
[1] 2
144 rows as expected :)
# A tibble: 144 × 3
`IPCC Area` animal weights
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
Final tibble has three columns and 144 rows as predicted.
---
title: "Challenge 3 Abby Balint"
author: "Abby Balint"
desription: "Tidy Data: Pivoting"
date: "09/27/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- abby_balint
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in data
I read in the "animal_weight" data set, and renamed it "weights" for easier coding. Then below that I found the summary to get a high level overview of the data (not that it is needed really here since there are only 9 rows originally)
```{r}
read_csv("_data/animal_weight.csv")
weights <- read_csv("_data/animal_weight.csv")
```
```{r}
summary(weights)
```
### Briefly describe the data
This dataset contains 17 variables and 9 rows of data related to animal weights by animal as well as region of the world. The reason that pivoting will be helping in visualizing the data here is because in the current format, we cannot filter by animal because each animal is its own variable. Pivoting the data will allow us to filter the data set easily based on animal to find average weights and filter by both animal type and region of the world.
### Challenge: Describe the final dimensions
To find the below final dimensions, I used the same formula as the example but used the animal weights data. My original data set started with 9 rows and 17 variables. Only one of the original variables will remain a variable. The 16 variables I am pivoting will turn into two variables - animal (names), and weights (values). My row numbers will now be 144 because I will have the 9 rows times the 16 variables I am transforming. I should end up with 3 columns, my one original variable and my 2 new variables.
```{r}
#existing rows/cases
nrow(weights)
#existing columns/cases
ncol(weights)
#expected rows/cases
nrow(weights) * (ncol(weights)-1)
# expected columns
1+1
```
144 rows as expected :)
### Challenge: Pivot the Chosen Data
```{r}
pivot_longer(weights, "Cattle - dairy":"Llamas",
names_to="animal",
values_to = "weights")
```
Final tibble has three columns and 144 rows as predicted.