Code
library(tidyverse)
::opts_chunk$set(echo = TRUE) knitr
Pradhakshya Dhanakumar
March 8, 2023
Read the data from a .csv file
Rows: 9 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): IPCC Area
dbl (16): Cattle - dairy, Cattle - non-dairy, Buffaloes, Swine - market, Swi...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 9 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
7 Asia 350 391 380 50 180 0.9 1.8 2.7 6.8
8 Middle … 275 173 380 28 28 0.9 1.8 2.7 6.8
9 Norther… 604 389 380 46 198 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
To find total rows and columns in the dataset
Data Summary
IPCC Area Cattle - dairy Cattle - non-dairy Buffaloes
Length:9 Min. :275.0 Min. :110 Min. :295.0
Class :character 1st Qu.:275.0 1st Qu.:173 1st Qu.:380.0
Mode :character Median :400.0 Median :330 Median :380.0
Mean :425.4 Mean :298 Mean :370.6
3rd Qu.:550.0 3rd Qu.:391 3rd Qu.:380.0
Max. :604.0 Max. :420 Max. :380.0
Swine - market Swine - breeding Chicken - Broilers Chicken - Layers
Min. :28.00 Min. : 28.0 Min. :0.9 Min. :1.8
1st Qu.:28.00 1st Qu.: 28.0 1st Qu.:0.9 1st Qu.:1.8
Median :45.00 Median :180.0 Median :0.9 Median :1.8
Mean :39.22 Mean :116.4 Mean :0.9 Mean :1.8
3rd Qu.:50.00 3rd Qu.:180.0 3rd Qu.:0.9 3rd Qu.:1.8
Max. :50.00 Max. :198.0 Max. :0.9 Max. :1.8
Ducks Turkeys Sheep Goats Horses
Min. :2.7 Min. :6.8 Min. :28.00 Min. :30.00 Min. :238.0
1st Qu.:2.7 1st Qu.:6.8 1st Qu.:28.00 1st Qu.:30.00 1st Qu.:238.0
Median :2.7 Median :6.8 Median :48.50 Median :38.50 Median :377.0
Mean :2.7 Mean :6.8 Mean :39.39 Mean :34.72 Mean :315.2
3rd Qu.:2.7 3rd Qu.:6.8 3rd Qu.:48.50 3rd Qu.:38.50 3rd Qu.:377.0
Max. :2.7 Max. :6.8 Max. :48.50 Max. :38.50 Max. :377.0
Asses Mules Camels Llamas
Min. :130 Min. :130 Min. :217 Min. :217
1st Qu.:130 1st Qu.:130 1st Qu.:217 1st Qu.:217
Median :130 Median :130 Median :217 Median :217
Mean :130 Mean :130 Mean :217 Mean :217
3rd Qu.:130 3rd Qu.:130 3rd Qu.:217 3rd Qu.:217
Max. :130 Max. :130 Max. :217 Max. :217
We can see that the animal_weight data shows the count of different cattle (Buffaloes, Chicken, Turkeys etc) across different regions. There are 9 rows and 17 columns in the given dataset. The 17 columns in this dataset make it difficult to understand or work on the data further. So we can use pivot_longer() function to restructure the data making it more efficient.
There are 9 rows (observations) and (17-1) columns (variables). Hence, we will be needing 9*16 = 144 rows and 3 columns of data(region, cattle type and weight) in the restructured data.
# A tibble: 144 × 3
`IPCC Area` Livestock Weight
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
Dimensions of restructured data
Now we can see that on using pivot_longer(), we get a dataset with 144 rows and 3 columns as expected.
---
title: "Challenge 3"
author: "Pradhakshya Dhanakumar"
desription: "Worked with Animal Weights Dataset"
date: "03/08/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- Challenge 3
- Pradhakshya Dhanakumar
- Animal Weights
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```
## Reading Data
Read the data from a .csv file
```{r}
data <- read_csv("_data/animal_weight.csv")
print(data)
```
## Shape of Data
To find total rows and columns in the dataset
```{r}
nrow(data)
ncol(data)
```
Data Summary
```{r}
summary(data)
```
We can see that the animal_weight data shows the count of different cattle (Buffaloes, Chicken, Turkeys etc) across different regions. There are 9 rows and 17 columns in the given dataset. The 17 columns in this dataset make it difficult to understand or work on the data further. So we can use pivot_longer() function to restructure the data making it more efficient.
There are 9 rows (observations) and (17-1) columns (variables). Hence, we will be needing 9*16 = 144 rows and 3 columns of data(region, cattle type and weight) in the restructured data.
## Pivot Data
```{r}
data_longer<-pivot_longer(data, col=-`IPCC Area`,
names_to = "Livestock",
values_to = "Weight")
print(data_longer)
```
Dimensions of restructured data
```{r}
dim(data_longer)
```
Now we can see that on using pivot_longer(), we get a dataset with 144 rows and 3 columns as expected.