Challenge 3 Solution

challenge_3

animal_weights

eggs

australian_marriage

usa_households

sce_labor

Susannah Reed Poland

Tidy Data: Pivoting

Author

Susannah Reed Poland

Published

June 8, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in the following dataset using the correct R package and command.

animal_weights.csv ⭐

I have renamed this dataset “animalweight” for reference.

Code

library(tidyverse)
animalweight<-read_csv("_data/animal_weight.csv")
animalweight

Briefly describe the data

From inspection, the “animalweight” dataframe contains data on the average weightage of 13 animals in 9 global regions defined by the IPCC (International Panel on Climate Change). The animals seem to be livestock. Three of the animals have been subdivided into two groups, so there are a total of 16 variables across 9 rows. Each value is a the average weight in kilograms.

These data are currently in a wide format, with a total of 144 unique cases. To tidy this dataframe, we will transform the dataframe such that each row will represent a unique case, with 3 variables: region, animal type, and average animal weight in kilograms.

Challenge: Describe the final dimensions

Because there are 144 unique cases (16 animal times * 9 regions), we will expect to see 144 (check out my in-line r code!) rows with 3 columns representing the 3 variables.

Code

# existing rows
nrow(animalweight)

[1] 9

Code

# existing columns
ncol(animalweight)

[1] 17

Code

#expected rows=cases 
nrow(animalweight)*(ncol(animalweight)-1)

[1] 144

Pivot the Data

Code

animalweightlonger<-pivot_longer(animalweight, col = !'IPCC Area', names_to = "Animal_Type", values_to = "Weight_KG")
animalweightlonger

In this longer table, each case is represented in a row, which represents a unique combination of IPCC region and Animal Type. It represents a tidy dataframe because each row is a unique case.

--- title: "Challenge 3 Solution" author: "Susannah Reed Poland" description: "Tidy Data: Pivoting" date: "6/8/2023" format: html: df-print: paged toc: true code-fold: true code-copy: true code-tools: true categories: - challenge_3 - animal_weights - eggs - australian_marriage - usa_households - sce_labor - Susannah Reed Poland --- ```{r} #| label: setup #| warning: false #| message: false library(tidyverse) knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) ``` ## Challenge Overview Today's challenge is to: 1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc) 2. identify what needs to be done to tidy the current data 3. anticipate the shape of pivoted data 4. pivot the data into tidy format using `pivot_longer` ## Read in data Read in the following dataset using the correct R package and command. - animal_weights.csv ⭐ I have renamed this dataset "animalweight" for reference. ```{r} library(tidyverse) animalweight<-read_csv("_data/animal_weight.csv") animalweight ``` ### Briefly describe the data From inspection, the "animalweight" dataframe contains data on the average weightage of 13 animals in 9 global regions defined by the IPCC ([International Panel on Climate Change](https://www.ipcc.ch/)). The animals seem to be livestock. Three of the animals have been subdivided into two groups, so there are a total of 16 variables across 9 rows. Each value is a the average weight in kilograms. These data are currently in a wide format, with a total of 144 unique cases. To tidy this dataframe, we will transform the dataframe such that each row will represent a unique case, with 3 variables: region, animal type, and average animal weight in kilograms. ### Challenge: Describe the final dimensions Because there are 144 unique cases (16 animal times * 9 regions), we will expect to see `r 9*16` (check out my in-line r code!) rows with 3 columns representing the 3 variables. ```{r} # existing rows nrow(animalweight) # existing columns ncol(animalweight) #expected rows=cases nrow(animalweight)*(ncol(animalweight)-1) ``` ## Pivot the Data ```{r} animalweightlonger<-pivot_longer(animalweight, col = !'IPCC Area', names_to = "Animal_Type", values_to = "Weight_KG") animalweightlonger ``` In this longer table, each case is represented in a row, which represents a unique combination of IPCC region and Animal Type. It represents a tidy dataframe because each row is a unique case.