Challenge 3

challenge_3

Anirudh Lakkaraju

animal_weights

Tidy Data: Pivoting

Author

Anirudh Lakkaraju

Published

May 2, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Reading the data

Code

df <- read_csv("_data/animal_weight.csv")

Code

head(df)

# A tibble: 6 × 17
  `IPCC Area`   `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
  <chr>                    <dbl>                <dbl>     <dbl>            <dbl>
1 Indian Subco…              275                  110       295               28
2 Eastern Euro…              550                  391       380               50
3 Africa                     275                  173       380               28
4 Oceania                    500                  330       380               45
5 Western Euro…              600                  420       380               50
6 Latin America              400                  305       380               28
# ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
#   `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
#   Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
#   Llamas <dbl>

Briefly describe the data

Find the number of rows and cols of the dataset

Code

nrow(df)

[1] 9

Code

ncol(df)

[1] 17

Code

summary(df)

  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217

The dataset contains animal weight data, which indicates the number of various types of livestock (such as buffaloes, chickens, and turkeys) in different regions. The dataset has 9 rows and 17 columns, but the 17 columns make the data difficult to handle or analyze. To make the data more efficient, we can utilize the “pivot_longer()” function to restructure it. This will transform the 17 columns into three columns, including region, cattle type, and weight, resulting in 144 rows (9 rows x 16 variables) of data.

Pivoting the data

Code

data_longer<-pivot_longer(df, col=-`IPCC Area`,
                                    names_to = "Livestock",
                                    values_to = "Weight")
print(data_longer)

# A tibble: 144 × 3
   `IPCC Area`         Livestock          Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# ℹ 134 more rows

Dimensions of restructured data

Code

dim(data_longer)

[1] 144   3

As expected, using pivot_longer(), we get a dataset with 144 rows and 3 cols.