Code
library(tidyverse)
library(readr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Poobigan Murugesan
May 10, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
Reading in data
# A tibble: 6 × 17
`IPCC Area` `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
<chr> <dbl> <dbl> <dbl> <dbl>
1 Indian Subco… 275 110 295 28
2 Eastern Euro… 550 391 380 50
3 Africa 275 173 380 28
4 Oceania 500 330 380 45
5 Western Euro… 600 420 380 50
6 Latin America 400 305 380 28
# ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
# `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
# Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
# Llamas <dbl>
Dimensions of the dataset
[1] 9 17
IPCC Area Cattle - dairy Cattle - non-dairy Buffaloes
Length:9 Min. :275.0 Min. :110 Min. :295.0
Class :character 1st Qu.:275.0 1st Qu.:173 1st Qu.:380.0
Mode :character Median :400.0 Median :330 Median :380.0
Mean :425.4 Mean :298 Mean :370.6
3rd Qu.:550.0 3rd Qu.:391 3rd Qu.:380.0
Max. :604.0 Max. :420 Max. :380.0
Swine - market Swine - breeding Chicken - Broilers Chicken - Layers
Min. :28.00 Min. : 28.0 Min. :0.9 Min. :1.8
1st Qu.:28.00 1st Qu.: 28.0 1st Qu.:0.9 1st Qu.:1.8
Median :45.00 Median :180.0 Median :0.9 Median :1.8
Mean :39.22 Mean :116.4 Mean :0.9 Mean :1.8
3rd Qu.:50.00 3rd Qu.:180.0 3rd Qu.:0.9 3rd Qu.:1.8
Max. :50.00 Max. :198.0 Max. :0.9 Max. :1.8
Ducks Turkeys Sheep Goats Horses
Min. :2.7 Min. :6.8 Min. :28.00 Min. :30.00 Min. :238.0
1st Qu.:2.7 1st Qu.:6.8 1st Qu.:28.00 1st Qu.:30.00 1st Qu.:238.0
Median :2.7 Median :6.8 Median :48.50 Median :38.50 Median :377.0
Mean :2.7 Mean :6.8 Mean :39.39 Mean :34.72 Mean :315.2
3rd Qu.:2.7 3rd Qu.:6.8 3rd Qu.:48.50 3rd Qu.:38.50 3rd Qu.:377.0
Max. :2.7 Max. :6.8 Max. :48.50 Max. :38.50 Max. :377.0
Asses Mules Camels Llamas
Min. :130 Min. :130 Min. :217 Min. :217
1st Qu.:130 1st Qu.:130 1st Qu.:217 1st Qu.:217
Median :130 Median :130 Median :217 Median :217
Mean :130 Mean :130 Mean :217 Mean :217
3rd Qu.:130 3rd Qu.:130 3rd Qu.:217 3rd Qu.:217
Max. :130 Max. :130 Max. :217 Max. :217
The “animal_weight.csv” dataset contains weights of various types of animals/livestock(like Ducks, Turkeys, Sheep, Goats, etc) according to the region they are found in. There are 9 rows(regions) and 17 columns in the dataset, including 1 column for the region. Having too many columns makes this dataset difficult to interpret. We can pivot the dataset to restructure it efficiently.
The end result would have 3 columns, 1 each for region, animal type and weight. So the expected number of rows would be 9 x (17-1) which is 144.
Expected pivoted dataset dimensions: 144x3
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.
# A tibble: 144 × 3
`IPCC Area` animal_type animal_weight
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# ℹ 134 more rows
[1] 144 3
After pivoting our data using pivot_longer(), our resulting data is \(144x3\) - exactly what we expected! Making the data more user-friendly and readable.
---
title: "Challenge 3"
author: "Poobigan Murugesan"
description: "Tidy Data: Pivoting"
date: "05/10/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- poobigan murugesan
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
Reading in data
```{r}
df <- read_csv("_data/animal_weight.csv", show_col_types=FALSE)
head(df)
```
### Briefly describe the data
Dimensions of the dataset
```{r}
dim(df)
summary(df)
```
The "animal_weight.csv" dataset contains weights of various types of animals/livestock(like Ducks, Turkeys, Sheep, Goats, etc) according to the region they are found in. There are 9 rows(regions) and 17 columns in the dataset, including 1 column for the region. Having too many columns makes this dataset difficult to interpret. We can pivot the dataset to restructure it efficiently.
### Anticipate the End Result and final dimensions
The end result would have 3 columns, 1 each for region, animal type and weight. So the expected number of rows would be 9 x (17-1) which is 144.
Expected pivoted dataset dimensions: 144x3
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
```{r}
#| tbl-cap: Pivoted Example
df_pivoted<-pivot_longer(df, col =-`IPCC Area`,
names_to="animal_type",
values_to = "animal_weight")
df_pivoted
dim(df_pivoted)
```
After pivoting our data using pivot_longer(), our resulting data is $144x3$ - exactly what we expected! Making the data more user-friendly and readable.