Challenge 4 Akhilesh

challenge_4
Author

Akhilesh Kumar Meghwal

Published

August 22, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Code
animal_weight<-read_csv("_data/animal_weight.csv",
                        show_col_types = FALSE)

Briefly describe the data

Code
print(summarytools::dfSummary(animal_weight,
                              varnumbers = FALSE,
                              plain.ascii  = FALSE,
                              style        = "grid",
                              graph.magnif = 0.50,
                              valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

animal_weight

Dimensions: 9 x 17
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
IPCC Area [character]
1. Africa
2. Asia
3. Eastern Europe
4. Indian Subcontinent
5. Latin America
6. Middle east
7. Northern America
8. Oceania
9. Western Europe
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
1(11.1%)
0 (0.0%)
Cattle - dairy [numeric]
Mean (sd) : 425.4 (140.4)
min ≤ med ≤ max:
275 ≤ 400 ≤ 604
IQR (CV) : 275 (0.3)
275:3(33.3%)
350:1(11.1%)
400:1(11.1%)
500:1(11.1%)
550:1(11.1%)
600:1(11.1%)
604:1(11.1%)
0 (0.0%)
Cattle - non-dairy [numeric]
Mean (sd) : 298 (116.3)
min ≤ med ≤ max:
110 ≤ 330 ≤ 420
IQR (CV) : 218 (0.4)
110:1(11.1%)
173:2(22.2%)
305:1(11.1%)
330:1(11.1%)
389:1(11.1%)
391:2(22.2%)
420:1(11.1%)
0 (0.0%)
Buffaloes [numeric]
Min : 295
Mean : 370.6
Max : 380
295:1(11.1%)
380:8(88.9%)
0 (0.0%)
Swine - market [numeric]
Mean (sd) : 39.2 (10.8)
min ≤ med ≤ max:
28 ≤ 45 ≤ 50
IQR (CV) : 22 (0.3)
28:4(44.4%)
45:1(11.1%)
46:1(11.1%)
50:3(33.3%)
0 (0.0%)
Swine - breeding [numeric]
Mean (sd) : 116.4 (84.2)
min ≤ med ≤ max:
28 ≤ 180 ≤ 198
IQR (CV) : 152 (0.7)
28:4(44.4%)
180:3(33.3%)
198:2(22.2%)
0 (0.0%)
Chicken - Broilers [numeric] 1 distinct value
0.90:9(100.0%)
0 (0.0%)
Chicken - Layers [numeric] 1 distinct value
1.80:9(100.0%)
0 (0.0%)
Ducks [numeric] 1 distinct value
2.70:9(100.0%)
0 (0.0%)
Turkeys [numeric] 1 distinct value
6.80:9(100.0%)
0 (0.0%)
Sheep [numeric]
Min : 28
Mean : 39.4
Max : 48.5
28.00:4(44.4%)
48.50:5(55.6%)
0 (0.0%)
Goats [numeric]
Min : 30
Mean : 34.7
Max : 38.5
30.00:4(44.4%)
38.50:5(55.6%)
0 (0.0%)
Horses [numeric]
Min : 238
Mean : 315.2
Max : 377
238:4(44.4%)
377:5(55.6%)
0 (0.0%)
Asses [numeric] 1 distinct value
130:9(100.0%)
0 (0.0%)
Mules [numeric] 1 distinct value
130:9(100.0%)
0 (0.0%)
Camels [numeric] 1 distinct value
217:9(100.0%)
0 (0.0%)
Llamas [numeric] 1 distinct value
217:9(100.0%)
0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-09-04

Column names of the dataframe
Code
colnames(animal_weight)
 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"            
Column classes of the dataframe
Code
col_classes = data.frame(t(data.frame(lapply(animal_weight,class))))
col_classes
                   t.data.frame.lapply.animal_weight..class...
IPCC.Area                                            character
Cattle...dairy                                         numeric
Cattle...non.dairy                                     numeric
Buffaloes                                              numeric
Swine...market                                         numeric
Swine...breeding                                       numeric
Chicken...Broilers                                     numeric
Chicken...Layers                                       numeric
Ducks                                                  numeric
Turkeys                                                numeric
Sheep                                                  numeric
Goats                                                  numeric
Horses                                                 numeric
Asses                                                  numeric
Mules                                                  numeric
Camels                                                 numeric
Llamas                                                 numeric
Summary, Dataframe
Code
summary(animal_weight)
  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217  

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
tidy data using pivot_longer, so that obervations represent individual observation and columns represent individual variable
Code
animal_weight_pivot <- pivot_longer(animal_weight, col = names(animal_weight)[2:17], names_to = 'animal_name', values_to = 'weight')
animal_weight_pivot
# A tibble: 144 × 3
   `IPCC Area`         animal_name        weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows
# ℹ Use `print(n = ...)` to see more rows

Identify variables that need to be mutated

col_classes below provide column wise class of dataframe animal_weight_pivot
‘IPCC Area’ and ‘animal_name’ are character class, and converted to factor class using mutate_at
Code
col_classes = data.frame(t(data.frame(lapply(animal_weight_pivot,class))))


animal_weight_pivot %>% 
  mutate_at(c('IPCC Area', 'animal_name'), factor)
# A tibble: 144 × 3
   `IPCC Area`         animal_name        weight
   <fct>               <fct>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows
# ℹ Use `print(n = ...)` to see more rows
col_classes for sanity check
Code
col_classes = data.frame(t(data.frame(lapply(animal_weight_pivot,class))))
col_classes
            t.data.frame.lapply.animal_weight_pivot..class...
IPCC.Area                                           character
animal_name                                         character
weight                                                numeric