Challenge 3

challenge_3

animal_weights

Tidy Data: Pivoting Animal Weights

Author

Nanci Kopecky

Published

March 6, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

animal_weights.csv ⭐
eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
australian_marriage*.xls ⭐⭐⭐
USA Households*.xlsx ⭐⭐⭐⭐
sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟

Code

library(readr)
animal_weight<-read.csv(file = "_data/animal_weight.csv",
                header=TRUE,
                sep = ","
                )

Briefly describe the data

Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”

This data file has nine geographic areas (9 observaitions) defined by the Inter-governmental Panel on Climate Change (IPCC) and the weights of seventeen animals (17 varioables). One idea is to pivot or organize the data to compare weights of animals in different geographic areas.

Code

head(animal_weight)

            IPCC.Area Cattle...dairy Cattle...non.dairy Buffaloes
1 Indian Subcontinent            275                110       295
2      Eastern Europe            550                391       380
3              Africa            275                173       380
4             Oceania            500                330       380
5      Western Europe            600                420       380
6       Latin America            400                305       380
  Swine...market Swine...breeding Chicken...Broilers Chicken...Layers Ducks
1             28               28                0.9              1.8   2.7
2             50              180                0.9              1.8   2.7
3             28               28                0.9              1.8   2.7
4             45              180                0.9              1.8   2.7
5             50              198                0.9              1.8   2.7
6             28               28                0.9              1.8   2.7
  Turkeys Sheep Goats Horses Asses Mules Camels Llamas
1     6.8  28.0  30.0    238   130   130    217    217
2     6.8  48.5  38.5    377   130   130    217    217
3     6.8  28.0  30.0    238   130   130    217    217
4     6.8  48.5  38.5    377   130   130    217    217
5     6.8  48.5  38.5    377   130   130    217    217
6     6.8  28.0  30.0    238   130   130    217    217

Code

view(animal_weight)
ncol(animal_weight)

[1] 17

Code

nrow(animal_weight)

[1] 9

Anticipate the End Result

The first step in pivoting the data is to try to come up with a concrete vision of what the end product should look like - that way you will know whether or not your pivoting was successful.

One easy way to do this is to think about the dimensions of your current data (tibble, dataframe, or matrix), and then calculate what the dimensions of the pivoted data should be.

Suppose you have a dataset with \(n\) rows and \(k\) variables. In our example, 3 of the variables are used to identify a case, so you will be pivoting \(k-3\) variables into a longer format where the \(k-3\) variable names will move into the names_to variable and the current values in each of those columns will move into the values_to variable. Therefore, we would expect \(n * (k-3)\) rows in the pivoted dataframe!

Example: find current and future data dimensions

Used the %>% pipe feature with the mutate function to add a column.

Code

animal_weight%>%mutate(Cattle...dairy - Cattle...non.dairy, .after = Cattle...non.dairy)

            IPCC.Area Cattle...dairy Cattle...non.dairy
1 Indian Subcontinent            275                110
2      Eastern Europe            550                391
3              Africa            275                173
4             Oceania            500                330
5      Western Europe            600                420
6       Latin America            400                305
7                Asia            350                391
8         Middle east            275                173
9    Northern America            604                389
  Cattle...dairy - Cattle...non.dairy Buffaloes Swine...market Swine...breeding
1                                 165       295             28               28
2                                 159       380             50              180
3                                 102       380             28               28
4                                 170       380             45              180
5                                 180       380             50              198
6                                  95       380             28               28
7                                 -41       380             50              180
8                                 102       380             28               28
9                                 215       380             46              198
  Chicken...Broilers Chicken...Layers Ducks Turkeys Sheep Goats Horses Asses
1                0.9              1.8   2.7     6.8  28.0  30.0    238   130
2                0.9              1.8   2.7     6.8  48.5  38.5    377   130
3                0.9              1.8   2.7     6.8  28.0  30.0    238   130
4                0.9              1.8   2.7     6.8  48.5  38.5    377   130
5                0.9              1.8   2.7     6.8  48.5  38.5    377   130
6                0.9              1.8   2.7     6.8  28.0  30.0    238   130
7                0.9              1.8   2.7     6.8  48.5  38.5    377   130
8                0.9              1.8   2.7     6.8  28.0  30.0    238   130
9                0.9              1.8   2.7     6.8  48.5  38.5    377   130
  Mules Camels Llamas
1   130    217    217
2   130    217    217
3   130    217    217
4   130    217    217
5   130    217    217
6   130    217    217
7   130    217    217
8   130    217    217
9   130    217    217

Code

summary(animal_weight)

  IPCC.Area         Cattle...dairy  Cattle...non.dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine...market  Swine...breeding Chicken...Broilers Chicken...Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217

Code

animal_weight %>%
  summarize(
    mean_Cattledairy=mean(Cattle...dairy),
    mean_Cattlenondairy=mean(Cattle...non.dairy),
    sd_Cattledairy=sd(Cattle...dairy),
    sd_Cattlenondairy=sd(Cattle...non.dairy)
    )

  mean_Cattledairy mean_Cattlenondairy sd_Cattledairy sd_Cattlenondairy
1         425.4444                 298       140.3853          116.2594

Code

ncol(animal_weight)

[1] 17

Code

nrow(animal_weight)

[1] 9

Practicing basics of tibble

Code

tb<-tibble(x=1:5,y=2,z=x+y)
tb

# A tibble: 5 × 3
      x     y     z
  <int> <dbl> <dbl>
1     1     2     3
2     2     2     4
3     3     2     5
4     4     2     6
5     5     2     7

Code

tb1<-tibble(': (' = "sad", ': )' = "happy",'24'="24")
tb1

# A tibble: 1 × 3
  `: (` `: )` `24` 
  <chr> <chr> <chr>
1 sad   happy 24

Code

as_tibble(animal_weight)

# A tibble: 9 × 17
  IPCC.A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <int>   <int>   <int>   <int>   <int>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <int>, Asses <int>,
#   Mules <int>, Camels <int>, Llamas <int>, and abbreviated variable names
#   ¹IPCC.Area, ²Cattle...dairy, ³Cattle...non.dairy, ⁴Buffaloes,
#   ⁵Swine...market, ⁶Swine...breeding, ⁷Chicken...Broilers, ⁸Chicken...Layers

Trying Data Frames

The first data frame is accurate, but not sure how useful. The second dataframe was just exploring the structure of the data frames, it is not useful or accurate, but I understand what the number means after the array c.

Code

df1<-tibble(IPCC.Area = rep(c("Indian Sub", "E. Europe", "Africa", "Oceania", "W. Europe", "Latin America", "Asia", "Middle East", "N. America"),1),
           CattleDairy = rnorm(9, mean=425.44, sd=140.39), 
           CattleNonDairy = rnorm(9, mean=298, sd=116.26))
df1

# A tibble: 9 × 3
  IPCC.Area     CattleDairy CattleNonDairy
  <chr>               <dbl>          <dbl>
1 Indian Sub           507.           248.
2 E. Europe            369.           465.
3 Africa               520.           202.
4 Oceania              394.           283.
5 W. Europe            320.           470.
6 Latin America        569.           529.
7 Asia                 582.           236.
8 Middle East          476.           467.
9 N. America           229.           110.

Code

df2<-tibble(CattleType=rep(c("Dairy", "Non Dairy"), 9),
            IPCC.Area =rep(c("Indian Sub", "E. Europe", "Africa", "Oceania", "W. Europe", "Latin America", "Asia", "Middle East", "N. America"),2),
           CattleDairy = rnorm(18,mean=425.44, sd=140.39),
           CattleNonDairy = rnorm(18, mean=298, sd=116.26)
           )
df2

# A tibble: 18 × 4
   CattleType IPCC.Area     CattleDairy CattleNonDairy
   <chr>      <chr>               <dbl>          <dbl>
 1 Dairy      Indian Sub          265.           138. 
 2 Non Dairy  E. Europe           466.           329. 
 3 Dairy      Africa              447.           312. 
 4 Non Dairy  Oceania             262.           181. 
 5 Dairy      W. Europe           336.           199. 
 6 Non Dairy  Latin America       319.           294. 
 7 Dairy      Asia                436.           302. 
 8 Non Dairy  Middle East         651.           286. 
 9 Dairy      N. America           36.0          258. 
10 Non Dairy  Indian Sub          801.           475. 
11 Dairy      E. Europe           467.           299. 
12 Non Dairy  Africa              345.           179. 
13 Dairy      Oceania             252.           454. 
14 Non Dairy  W. Europe           286.           196. 
15 Dairy      Latin America       645.           356. 
16 Non Dairy  Asia                409.           302. 
17 Dairy      Middle East         399.           112. 
18 Non Dairy  N. America          119.            85.9

Code

df3<-tibble(CattleType=rep(c("Dairy", "Non Dairy"), 1),
             IndianSub=rep(c(rnorm(1,mean=425.44, sd=140.39), rnorm(1, mean=298, sd=116.26))),
            E.Europe=rep(c(rnorm(1,mean=425.44, sd=140.39), rnorm(1, mean=298, sd=116.26))))

df3

# A tibble: 2 × 3
  CattleType IndianSub E.Europe
  <chr>          <dbl>    <dbl>
1 Dairy           361.     512.
2 Non Dairy       300.     280.

Pivot the Data

Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check.

Example

Pivot_Longer command

Code

df1<-pivot_longer(df1, col = c(CattleDairy, CattleNonDairy),
                  names_to = "Cattle_Type",
                  values_to = "Animal_Weight")
df1

# A tibble: 18 × 3
   IPCC.Area     Cattle_Type    Animal_Weight
   <chr>         <chr>                  <dbl>
 1 Indian Sub    CattleDairy             507.
 2 Indian Sub    CattleNonDairy          248.
 3 E. Europe     CattleDairy             369.
 4 E. Europe     CattleNonDairy          465.
 5 Africa        CattleDairy             520.
 6 Africa        CattleNonDairy          202.
 7 Oceania       CattleDairy             394.
 8 Oceania       CattleNonDairy          283.
 9 W. Europe     CattleDairy             320.
10 W. Europe     CattleNonDairy          470.
11 Latin America CattleDairy             569.
12 Latin America CattleNonDairy          529.
13 Asia          CattleDairy             582.
14 Asia          CattleNonDairy          236.
15 Middle East   CattleDairy             476.
16 Middle East   CattleNonDairy          467.
17 N. America    CattleDairy             229.
18 N. America    CattleNonDairy          110.