Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Xinpeng Liu
May 30, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”
we choose - animal_weights.csv ⭐
Rows: 9
Columns: 17
$ `IPCC Area` <chr> "Indian Subcontinent", "Eastern Europe", "Africa"…
$ `Cattle - dairy` <dbl> 275, 550, 275, 500, 600, 400, 350, 275, 604
$ `Cattle - non-dairy` <dbl> 110, 391, 173, 330, 420, 305, 391, 173, 389
$ Buffaloes <dbl> 295, 380, 380, 380, 380, 380, 380, 380, 380
$ `Swine - market` <dbl> 28, 50, 28, 45, 50, 28, 50, 28, 46
$ `Swine - breeding` <dbl> 28, 180, 28, 180, 198, 28, 180, 28, 198
$ `Chicken - Broilers` <dbl> 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9
$ `Chicken - Layers` <dbl> 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8
$ Ducks <dbl> 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7, 2.7
$ Turkeys <dbl> 6.8, 6.8, 6.8, 6.8, 6.8, 6.8, 6.8, 6.8, 6.8
$ Sheep <dbl> 28.0, 48.5, 28.0, 48.5, 48.5, 28.0, 48.5, 28.0, 4…
$ Goats <dbl> 30.0, 38.5, 30.0, 38.5, 38.5, 30.0, 38.5, 30.0, 3…
$ Horses <dbl> 238, 377, 238, 377, 377, 238, 377, 238, 377
$ Asses <dbl> 130, 130, 130, 130, 130, 130, 130, 130, 130
$ Mules <dbl> 130, 130, 130, 130, 130, 130, 130, 130, 130
$ Camels <dbl> 217, 217, 217, 217, 217, 217, 217, 217, 217
$ Llamas <dbl> 217, 217, 217, 217, 217, 217, 217, 217, 217
we would have three columns: ‘IPCC Area’, ‘Animal Type’, and ‘Average Weight’. Each row would then represent the average weight of a particular type of animal in a specific region. For example, one row might be ‘Eastern Europe’, ‘Cattle - dairy’, ‘550’.
Pivoting the data in this way makes it “tidy” as per Hadley Wickham’s principles:
Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table.
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a “sanity” check. From the data provided, we can see that there are 9 IPCC areas (or rows) and 17 columns. One of these columns is used to identify a case, that is, ‘IPCC Area’. So, we have 17 columns that are actual variables (each representing a different animal type).
In this case, after pivoting, we should have n*(k-1) rows in the pivoted dataframe.
Here n is the number of rows (9 IPCC Areas) and k is the number of columns (17). So, the dimensions of the pivoted dataframe should be \(9*(17-1)\), which equals 144 rows.
In the pivoted dataframe, we would have three columns: ‘IPCC Area’, ‘Animal Type’, and ‘Average Weight’. So the total number of data points in the dataframe should remain the same: \(9 * 17 (original dataframe) = 144 * 3 (pivoted dataframe)\).
Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?
# A tibble: 144 × 3
`IPCC Area` `Animal Type` `Average Weight`
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
11 Indian Subcontinent Goats 30
12 Indian Subcontinent Horses 238
13 Indian Subcontinent Asses 130
14 Indian Subcontinent Mules 130
15 Indian Subcontinent Camels 217
16 Indian Subcontinent Llamas 217
17 Eastern Europe Cattle - dairy 550
18 Eastern Europe Cattle - non-dairy 391
19 Eastern Europe Buffaloes 380
20 Eastern Europe Swine - market 50
21 Eastern Europe Swine - breeding 180
22 Eastern Europe Chicken - Broilers 0.9
23 Eastern Europe Chicken - Layers 1.8
24 Eastern Europe Ducks 2.7
25 Eastern Europe Turkeys 6.8
26 Eastern Europe Sheep 48.5
27 Eastern Europe Goats 38.5
28 Eastern Europe Horses 377
29 Eastern Europe Asses 130
30 Eastern Europe Mules 130
31 Eastern Europe Camels 217
32 Eastern Europe Llamas 217
33 Africa Cattle - dairy 275
34 Africa Cattle - non-dairy 173
35 Africa Buffaloes 380
36 Africa Swine - market 28
37 Africa Swine - breeding 28
38 Africa Chicken - Broilers 0.9
39 Africa Chicken - Layers 1.8
40 Africa Ducks 2.7
41 Africa Turkeys 6.8
42 Africa Sheep 28
43 Africa Goats 30
44 Africa Horses 238
45 Africa Asses 130
46 Africa Mules 130
47 Africa Camels 217
48 Africa Llamas 217
49 Oceania Cattle - dairy 500
50 Oceania Cattle - non-dairy 330
51 Oceania Buffaloes 380
52 Oceania Swine - market 45
53 Oceania Swine - breeding 180
54 Oceania Chicken - Broilers 0.9
55 Oceania Chicken - Layers 1.8
56 Oceania Ducks 2.7
57 Oceania Turkeys 6.8
58 Oceania Sheep 48.5
59 Oceania Goats 38.5
60 Oceania Horses 377
61 Oceania Asses 130
62 Oceania Mules 130
63 Oceania Camels 217
64 Oceania Llamas 217
65 Western Europe Cattle - dairy 600
66 Western Europe Cattle - non-dairy 420
67 Western Europe Buffaloes 380
68 Western Europe Swine - market 50
69 Western Europe Swine - breeding 198
70 Western Europe Chicken - Broilers 0.9
71 Western Europe Chicken - Layers 1.8
72 Western Europe Ducks 2.7
73 Western Europe Turkeys 6.8
74 Western Europe Sheep 48.5
75 Western Europe Goats 38.5
76 Western Europe Horses 377
77 Western Europe Asses 130
78 Western Europe Mules 130
79 Western Europe Camels 217
80 Western Europe Llamas 217
81 Latin America Cattle - dairy 400
82 Latin America Cattle - non-dairy 305
83 Latin America Buffaloes 380
84 Latin America Swine - market 28
85 Latin America Swine - breeding 28
86 Latin America Chicken - Broilers 0.9
87 Latin America Chicken - Layers 1.8
88 Latin America Ducks 2.7
89 Latin America Turkeys 6.8
90 Latin America Sheep 28
91 Latin America Goats 30
92 Latin America Horses 238
93 Latin America Asses 130
94 Latin America Mules 130
95 Latin America Camels 217
96 Latin America Llamas 217
97 Asia Cattle - dairy 350
98 Asia Cattle - non-dairy 391
99 Asia Buffaloes 380
100 Asia Swine - market 50
101 Asia Swine - breeding 180
102 Asia Chicken - Broilers 0.9
103 Asia Chicken - Layers 1.8
104 Asia Ducks 2.7
105 Asia Turkeys 6.8
106 Asia Sheep 48.5
107 Asia Goats 38.5
108 Asia Horses 377
109 Asia Asses 130
110 Asia Mules 130
111 Asia Camels 217
112 Asia Llamas 217
113 Middle east Cattle - dairy 275
114 Middle east Cattle - non-dairy 173
115 Middle east Buffaloes 380
116 Middle east Swine - market 28
117 Middle east Swine - breeding 28
118 Middle east Chicken - Broilers 0.9
119 Middle east Chicken - Layers 1.8
120 Middle east Ducks 2.7
121 Middle east Turkeys 6.8
122 Middle east Sheep 28
123 Middle east Goats 30
124 Middle east Horses 238
125 Middle east Asses 130
126 Middle east Mules 130
127 Middle east Camels 217
128 Middle east Llamas 217
129 Northern America Cattle - dairy 604
130 Northern America Cattle - non-dairy 389
131 Northern America Buffaloes 380
132 Northern America Swine - market 46
133 Northern America Swine - breeding 198
134 Northern America Chicken - Broilers 0.9
135 Northern America Chicken - Layers 1.8
136 Northern America Ducks 2.7
137 Northern America Turkeys 6.8
138 Northern America Sheep 48.5
139 Northern America Goats 38.5
140 Northern America Horses 377
141 Northern America Asses 130
142 Northern America Mules 130
143 Northern America Camels 217
144 Northern America Llamas 217
The data is now in a tidy format. This fulfills the three principles of tidy data:
Thus, in this context, a new “case” corresponds to an observation of the average weight for a specific type of animal in a specific area. For instance, the first row tells us that the average weight of dairy cattle in the Indian Subcontinent is 275 units (presumably kilograms or pounds, although the data doesn’t specify).
This tidy format makes it easier to perform subsequent data analyses. For instance, we can now easily calculate the overall average weight for each animal type, compare average weights across different areas, or examine the distribution of weights for each animal type, among other analyses.
---
title: "Challenge 3 Submission"
author: "Xinpeng Liu"
description: "Tidy Data: Pivoting"
date: "5/30/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
data<- read_csv("_data/animal_weight.csv")
```
### Briefly describe the data
Describe the data, and be sure to comment on why you are planning to pivot it to make it "tidy"
we choose - animal_weights.csv ⭐
```{r}
glimpse(data)
```
we would have three columns: 'IPCC Area', 'Animal Type', and 'Average Weight'. Each row would then represent the average weight of a particular type of animal in a specific region. For example, one row might be 'Eastern Europe', 'Cattle - dairy', '550'.
Pivoting the data in this way makes it "tidy" as per Hadley Wickham's principles:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
## Anticipate the End Result
```{r}
#existing rows/cases
nrow(data)
#existing columns/cases
ncol(data)
#expected rows/cases
nrow(data) * (ncol(data)-1)
# expected columns
2 + 1
```
## Pivot the Data
Now we will pivot the data, and compare our pivoted data dimensions to the dimensions calculated above as a "sanity" check.
From the data provided, we can see that there are 9 IPCC areas (or rows) and 17 columns. One of these columns is used to identify a case, that is, 'IPCC Area'. So, we have 17 columns that are actual variables (each representing a different animal type).
In this case, after pivoting, we should have n*(k-1) rows in the pivoted dataframe.
Here n is the number of rows (9 IPCC Areas) and k is the number of columns (17). So, the dimensions of the pivoted dataframe should be $9*(17-1)$, which equals 144 rows.
In the pivoted dataframe, we would have three columns: 'IPCC Area', 'Animal Type', and 'Average Weight'. So the total number of data points in the dataframe should remain the same: $9 * 17 (original dataframe) = 144 * 3 (pivoted dataframe)$.
### Challenge: Pivot the Chosen Data
Document your work here. What will a new "case" be once you have pivoted the data? How does it meet requirements for tidy data?
```{r}
# Pivoting the data to a tidy format
animal_weight_pivoted <- pivot_longer(data,
cols = 'Cattle - dairy':'Llamas',
names_to = "Animal Type",
values_to = "Average Weight")
print(animal_weight_pivoted, n = nrow(animal_weight_pivoted))
```
The data is now in a tidy format. This fulfills the three principles of tidy data:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
Thus, in this context, a new "case" corresponds to an observation of the average weight for a specific type of animal in a specific area. For instance, the first row tells us that the average weight of dairy cattle in the Indian Subcontinent is 275 units (presumably kilograms or pounds, although the data doesn't specify).
This tidy format makes it easier to perform subsequent data analyses. For instance, we can now easily calculate the overall average weight for each animal type, compare average weights across different areas, or examine the distribution of weights for each animal type, among other analyses.