challenge_3
Priyanka Perumalla
animal_weights
Tidy Data: Pivoting
Author

Priyanka Perumalla

Published

May 15, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • animal_weights.csv ⭐
  • eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
  • australian_marriage*.xls ⭐⭐⭐
  • USA Households*.xlsx ⭐⭐⭐⭐
  • sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
Code
animal_weights_data <- read_csv("_data/animal_weight.csv")
print(animal_weights_data,show_col_types = FALSE)
# A tibble: 9 × 17
  `IPCC Area`   `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
  <chr>                    <dbl>                <dbl>     <dbl>            <dbl>
1 Indian Subco…              275                  110       295               28
2 Eastern Euro…              550                  391       380               50
3 Africa                     275                  173       380               28
4 Oceania                    500                  330       380               45
5 Western Euro…              600                  420       380               50
6 Latin America              400                  305       380               28
7 Asia                       350                  391       380               50
8 Middle east                275                  173       380               28
9 Northern Ame…              604                  389       380               46
# ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
#   `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
#   Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
#   Llamas <dbl>
Code
head(animal_weights_data)
# A tibble: 6 × 17
  `IPCC Area`   `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
  <chr>                    <dbl>                <dbl>     <dbl>            <dbl>
1 Indian Subco…              275                  110       295               28
2 Eastern Euro…              550                  391       380               50
3 Africa                     275                  173       380               28
4 Oceania                    500                  330       380               45
5 Western Euro…              600                  420       380               50
6 Latin America              400                  305       380               28
# ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
#   `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
#   Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
#   Llamas <dbl>

Briefly describe the data

Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”

Code
nrow(animal_weights_data)
[1] 9
Code
ncol(animal_weights_data)
[1] 17
Code
summary(animal_weights_data)
  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217  

The data set has information on animal weights by geographical area. There are again different types of categories of animals as columns. The data set in its original orientation gives us the information on how the animal weights are changing country wise depending on the category they animals they fall under.I plan to pivot the table to see the variation in weights of livestock alone preliminarly by country.The inital data set has 9 rows and 17 columns.

Anticipate the End Result

The first step in pivoting the data is to try to come up with a concrete vision of what the end product should look like - that way you will know whether or not your pivoting was successful.

I anticipate the data to look smaller and more readable as I am bringing together livestock information. The dimensions that are anticipated at 144 x 3.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Code
animal_data_pivoted<-pivot_longer(animal_weights_data, col=-`IPCC Area`,
                                    names_to = "Livestock",
                                    values_to = "Weight")
print(animal_data_pivoted)
# A tibble: 144 × 3
   `IPCC Area`         Livestock          Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# ℹ 134 more rows

Dimensions of pivoted tibble after restructuring the data

Code
dim(animal_data_pivoted)
[1] 144   3

Yes, once it is pivoted long, our resulting data are \(144x3\) - exactly what we expected!

Challenge: Describe the final dimensions

Document your work here.

The final dimensions of pivoted data are 144 x 3. This is expected as we used pivot_longer().

Code
dim(animal_data_pivoted)
[1] 144   3