DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 3 Abby Balint

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Read in data
    • Briefly describe the data
    • Challenge: Describe the final dimensions
    • Challenge: Pivot the Chosen Data

Challenge 3 Abby Balint

  • Show All Code
  • Hide All Code

  • View Source
challenge_3
animal_weights
abby_balint
Author

Abby Balint

Published

September 27, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

I read in the “animal_weight” data set, and renamed it “weights” for easier coding. Then below that I found the summary to get a high level overview of the data (not that it is needed really here since there are only 9 rows originally)

Code
read_csv("_data/animal_weight.csv")
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`
Code
weights <- read_csv("_data/animal_weight.csv")
Code
summary(weights)
  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217  

Briefly describe the data

This dataset contains 17 variables and 9 rows of data related to animal weights by animal as well as region of the world. The reason that pivoting will be helping in visualizing the data here is because in the current format, we cannot filter by animal because each animal is its own variable. Pivoting the data will allow us to filter the data set easily based on animal to find average weights and filter by both animal type and region of the world.

Challenge: Describe the final dimensions

To find the below final dimensions, I used the same formula as the example but used the animal weights data. My original data set started with 9 rows and 17 variables. Only one of the original variables will remain a variable. The 16 variables I am pivoting will turn into two variables - animal (names), and weights (values). My row numbers will now be 144 because I will have the 9 rows times the 16 variables I am transforming. I should end up with 3 columns, my one original variable and my 2 new variables.

Code
#existing rows/cases
nrow(weights)
[1] 9
Code
#existing columns/cases
ncol(weights)
[1] 17
Code
#expected rows/cases
nrow(weights) * (ncol(weights)-1)
[1] 144
Code
# expected columns 
1+1
[1] 2

144 rows as expected :)

Challenge: Pivot the Chosen Data

Code
pivot_longer(weights, "Cattle - dairy":"Llamas",
                 names_to="animal",
                 values_to = "weights")
# A tibble: 144 × 3
   `IPCC Area`         animal             weights
   <chr>               <chr>                <dbl>
 1 Indian Subcontinent Cattle - dairy       275  
 2 Indian Subcontinent Cattle - non-dairy   110  
 3 Indian Subcontinent Buffaloes            295  
 4 Indian Subcontinent Swine - market        28  
 5 Indian Subcontinent Swine - breeding      28  
 6 Indian Subcontinent Chicken - Broilers     0.9
 7 Indian Subcontinent Chicken - Layers       1.8
 8 Indian Subcontinent Ducks                  2.7
 9 Indian Subcontinent Turkeys                6.8
10 Indian Subcontinent Sheep                 28  
# … with 134 more rows

Final tibble has three columns and 144 rows as predicted.

Source Code
---
title: "Challenge 3 Abby Balint"
author: "Abby Balint"
desription: "Tidy Data: Pivoting"
date: "09/27/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - abby_balint
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Read in data

I read in the "animal_weight" data set, and renamed it "weights" for easier coding. Then below that I found the summary to get a high level overview of the data (not that it is needed really here since there are only 9 rows originally)

```{r}
read_csv("_data/animal_weight.csv")
weights <- read_csv("_data/animal_weight.csv")

```

```{r}
summary(weights)
```


### Briefly describe the data


This dataset contains 17 variables and 9 rows of data related to animal weights by animal as well as region of the world. The reason that pivoting will be helping in visualizing the data here is because in the current format, we cannot filter by animal because each animal is its own variable. Pivoting the data will allow us to filter the data set easily based on animal to find average weights and filter by both animal type and region of the world.


### Challenge: Describe the final dimensions

To find the below final dimensions, I used the same formula as the example but used the animal weights data. My original data set started with 9 rows and 17 variables. Only one of the original variables will remain a variable. The 16 variables I am pivoting will turn into two variables - animal (names), and weights (values). My row numbers will now be 144 because I will have the 9 rows times the 16 variables I am transforming. I should end up with 3 columns, my one original variable and my 2 new variables.
```{r}
#existing rows/cases
nrow(weights)

#existing columns/cases
ncol(weights)

#expected rows/cases
nrow(weights) * (ncol(weights)-1)

# expected columns 
1+1

```

144 rows as expected :)


### Challenge: Pivot the Chosen Data


```{r}
pivot_longer(weights, "Cattle - dairy":"Llamas",
                 names_to="animal",
                 values_to = "weights")
```

Final tibble has three columns and 144 rows as predicted.