Challenge 3

challenge_3

animal_weights

poobigan murugesan

Tidy Data: Pivoting

Author

Poobigan Murugesan

Published

May 10, 2023

Code

library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

animal_weights.csv ⭐
eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
australian_marriage*.xls ⭐⭐⭐
USA Households*.xlsx ⭐⭐⭐⭐
sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟

Reading in data

Code

df <- read_csv("_data/animal_weight.csv", show_col_types=FALSE)
head(df)

# A tibble: 6 × 17
  `IPCC Area`   `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
  <chr>                    <dbl>                <dbl>     <dbl>            <dbl>
1 Indian Subco…              275                  110       295               28
2 Eastern Euro…              550                  391       380               50
3 Africa                     275                  173       380               28
4 Oceania                    500                  330       380               45
5 Western Euro…              600                  420       380               50
6 Latin America              400                  305       380               28
# ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
#   `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
#   Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
#   Llamas <dbl>

Briefly describe the data

Dimensions of the dataset

Code

dim(df)

[1]  9 17

Code

summary(df)

  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217

The “animal_weight.csv” dataset contains weights of various types of animals/livestock(like Ducks, Turkeys, Sheep, Goats, etc) according to the region they are found in. There are 9 rows(regions) and 17 columns in the dataset, including 1 column for the region. Having too many columns makes this dataset difficult to interpret. We can pivot the dataset to restructure it efficiently.

Anticipate the End Result and final dimensions

The end result would have 3 columns, 1 each for region, animal type and weight. So the expected number of rows would be 9 x (17-1) which is 144.

Expected pivoted dataset dimensions: 144x3

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Code

df_pivoted<-pivot_longer(df, col =-`IPCC Area`,
                 names_to="animal_type",
                 values_to = "animal_weight")
df_pivoted

# A tibble: 144 × 3
   `IPCC Area`         animal_type        animal_weight
   <chr>               <chr>                      <dbl>
 1 Indian Subcontinent Cattle - dairy             275  
 2 Indian Subcontinent Cattle - non-dairy         110  
 3 Indian Subcontinent Buffaloes                  295  
 4 Indian Subcontinent Swine - market              28  
 5 Indian Subcontinent Swine - breeding            28  
 6 Indian Subcontinent Chicken - Broilers           0.9
 7 Indian Subcontinent Chicken - Layers             1.8
 8 Indian Subcontinent Ducks                        2.7
 9 Indian Subcontinent Turkeys                      6.8
10 Indian Subcontinent Sheep                       28  
# ℹ 134 more rows

Code

dim(df_pivoted)

[1] 144   3

After pivoting our data using pivot_longer(), our resulting data is \(144x3\) - exactly what we expected! Making the data more user-friendly and readable.