DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 3 Instructions

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Anticipate the End Result
  • Pivot the Data
    • cross checking if the pivoted data has met the expectations

Challenge 3 Instructions

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Tejaswini_Ketineni

Published

August 21, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Read in data

The data that we consider for doing the current challenge is animal weight

Code
library(readr)
animal_weight <- read_csv("_data/animal_weight.csv")

Briefly describe the data

Code
colnames(animal_weight)
 [1] "IPCC Area"          "Cattle - dairy"     "Cattle - non-dairy"
 [4] "Buffaloes"          "Swine - market"     "Swine - breeding"  
 [7] "Chicken - Broilers" "Chicken - Layers"   "Ducks"             
[10] "Turkeys"            "Sheep"              "Goats"             
[13] "Horses"             "Asses"              "Mules"             
[16] "Camels"             "Llamas"            
Code
nrow(animal_weight)
[1] 9

It has 9 rows

Code
ncol(animal_weight)
[1] 17

It has 17 columns

Code
dim(animal_weight)
[1]  9 17

The dimensions of the table : 9,17

Code
summary(animal_weight)
  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217  

while we observe the data set, we see that there are no missing values.

Code
head(animal_weight)
# A tibble: 6 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`

when we observe the data using head, we see that weight of the category of types of animals is distributed across the regions, performing pivot would help to avoid the reccurence of weights across the regions. All the types of animals can be named as the Animal type and it would have 3 columns : IPCC Area, weight, Animaltype.

Anticipate the End Result

As we have already computed the no.of rows and columns, now we must compute the total expected rows in the pivoted data.

Code
nrow(animal_weight)*(ncol(animal_weight)-1)
[1] 144

As per the discussions above, there must be 144 rows and 3 columns

Pivot the Data

Now we will pivot the data,

Code
df<- pivot_longer(animal_weight,
                         col = c('Cattle - dairy', 'Cattle - non-dairy', 'Buffaloes', 'Swine - market', 'Swine - breeding', 'Chicken - Broilers', 'Chicken - Layers', 'Ducks', 'Turkeys', 'Sheep', 'Goats', 'Horses', 'Asses', 'Mules', 'Camels', 'Llamas'), names_to = 'Animal Type', values_to = 'Weight')
df
# A tibble: 144 × 3
   `IPCC Area`         `Animal Type`      Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows

cross checking if the pivoted data has met the expectations

Computing the number of rows and columns for the pivoted data.

Code
nrow(df)
[1] 144
Code
ncol(df)
[1] 3
Code
dim(df)
[1] 144   3
Code
summary(df)
  IPCC Area         Animal Type            Weight     
 Length:144         Length:144         Min.   :  0.9  
 Class :character   Class :character   1st Qu.: 22.7  
 Mode  :character   Mode  :character   Median :130.0  
                                       Mean   :146.6  
                                       3rd Qu.:217.0  
                                       Max.   :604.0  

performing summary function ensures that there are three columns and there are no missing values as well, which ensures that the quality of the data is ensured.

Source Code
---
title: "Challenge 3 Instructions"
author: "Tejaswini_Ketineni"
desription: "Reading in data and creating a post"
date: "08/21/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---



```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview


## Read in data

The data that we consider for doing the current challenge is animal weight

```{r}
library(readr)
animal_weight <- read_csv("_data/animal_weight.csv")
```

### Briefly describe the data

```{r}
colnames(animal_weight)
```

```{r}
nrow(animal_weight)
```
It has 9 rows

```{r}
ncol(animal_weight)
```

It has 17 columns

```{r}
dim(animal_weight)
```

The dimensions of the table : 9,17

```{r}
summary(animal_weight)
```

while we observe the data set, we see that there are no missing values.

```{r}
head(animal_weight)
```
when we observe the data using head, we see that weight of the category of types of animals is distributed across the regions, performing pivot would help to avoid the reccurence of weights across the regions. All the types of animals can be named as the Animal type and it would have 3 columns : IPCC Area, weight, Animaltype.

## Anticipate the End Result

As we have already computed the no.of rows and columns, now we must compute the total expected rows in the pivoted data.

```{r}
nrow(animal_weight)*(ncol(animal_weight)-1)
```

As per the discussions above, there must be 144 rows and 3 columns

## Pivot the Data

Now we will pivot the data, 


```{r}
df<- pivot_longer(animal_weight,
                         col = c('Cattle - dairy', 'Cattle - non-dairy', 'Buffaloes', 'Swine - market', 'Swine - breeding', 'Chicken - Broilers', 'Chicken - Layers', 'Ducks', 'Turkeys', 'Sheep', 'Goats', 'Horses', 'Asses', 'Mules', 'Camels', 'Llamas'), names_to = 'Animal Type', values_to = 'Weight')
df

```

### cross checking if the pivoted data has met the expectations

Computing the number of rows and columns for the pivoted data.

```{r}
nrow(df)

```

```{r}
ncol(df)

```

```{r}
dim(df)

```

```{r}
summary(df)

```

performing summary function ensures that there are three columns and there are no missing values as well, which ensures that the quality of the data is ensured.