Code
library(tidyverse)
library(dplyr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Courtney Naughton
September 25, 2022
Challenge #3
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 9 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
7 Asia 350 391 380 50 180 0.9 1.8 2.7 6.8
8 Middle … 275 173 380 28 28 0.9 1.8 2.7 6.8
9 Norther… 604 389 380 46 198 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.
Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.
We expect a new dataframe to have \(9*16 = 144\) rows x \(3\) columns.
# A tibble: 9 × 17
IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian … 275 110 295 28 28 0.9 1.8 2.7 6.8
2 Eastern… 550 391 380 50 180 0.9 1.8 2.7 6.8
3 Africa 275 173 380 28 28 0.9 1.8 2.7 6.8
4 Oceania 500 330 380 45 180 0.9 1.8 2.7 6.8
5 Western… 600 420 380 50 198 0.9 1.8 2.7 6.8
6 Latin A… 400 305 380 28 28 0.9 1.8 2.7 6.8
7 Asia 350 391 380 50 180 0.9 1.8 2.7 6.8
8 Middle … 275 173 380 28 28 0.9 1.8 2.7 6.8
9 Norther… 604 389 380 46 198 0.9 1.8 2.7 6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
# Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
# ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
# ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
# ⁸`Chicken - Layers`
[1] 9
[1] 17
[1] 144
[1] 144
With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.
animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy", "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
names_to="Animal_type",
values_to = "Weight")
animal_weights
# A tibble: 144 × 3
`IPCC Area` Animal_type Weight
<chr> <chr> <dbl>
1 Indian Subcontinent Cattle - dairy 275
2 Indian Subcontinent Cattle - non-dairy 110
3 Indian Subcontinent Buffaloes 295
4 Indian Subcontinent Swine - market 28
5 Indian Subcontinent Swine - breeding 28
6 Indian Subcontinent Chicken - Broilers 0.9
7 Indian Subcontinent Chicken - Layers 1.8
8 Indian Subcontinent Ducks 2.7
9 Indian Subcontinent Turkeys 6.8
10 Indian Subcontinent Sheep 28
# … with 134 more rows
Read in one (or more) of the following datasets, using the correct R package and command.
This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category.
Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.
We expect a new dataframe to have \(120*(6-2) = 480\) rows x \(4\) columns.
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132 230
2 February 2004 128. 226. 134. 230
3 March 2004 131 225 137 230
4 April 2004 131 225 137 234.
5 May 2004 131 225 137 236
6 June 2004 134. 231. 137 241
7 July 2004 134. 234. 137 241
8 August 2004 134. 234. 137 241
9 September 2004 130. 234. 136. 241
10 October 2004 128. 234. 136. 241
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen
[1] 120
[1] 6
[1] 480
[1] 480
With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.
#Renaming the column names
eggs2<-rename(eggs,
"Large Half Dozen" = large_half_dozen,
"Large Dozen" = large_dozen,
"Extra Large Half Dozen"= extra_large_half_dozen,
"Extra Large Dozen" = extra_large_dozen )
eggs2%>%
pivot_longer(
cols = ends_with("Dozen"),
names_to = "Category",
values_to = "Cost100"
)
# A tibble: 480 × 4
month year Category Cost100
<chr> <dbl> <chr> <dbl>
1 January 2004 Large Half Dozen 126
2 January 2004 Large Dozen 230
3 January 2004 Extra Large Half Dozen 132
4 January 2004 Extra Large Dozen 230
5 February 2004 Large Half Dozen 128.
6 February 2004 Large Dozen 226.
7 February 2004 Extra Large Half Dozen 134.
8 February 2004 Extra Large Dozen 230
9 March 2004 Large Half Dozen 131
10 March 2004 Large Dozen 225
# … with 470 more rows
---
title: "Challenge 3 Naughton"
author: "Courtney Naughton"
desription: "Tidy Data: Pivoting"
date: "09/25/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- Courtney Naughton
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Challenge #3
::: panel-tabset
## Animal Weights
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
```{r}
animal_weights<-read_csv("_data/animal_weight.csv")
animal_weights
```
### Briefly describe the data
This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.
### Anticipate the End Result
Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.
### Challenge: Describe the final dimensions
We expect a new dataframe to have $9*16 = 144$ rows x $3$ columns.
```{r}
#| tbl-cap: Animal
animal_weights<-tibble(animal_weights)
animal_weights
#existing rows/cases
nrow(animal_weights)
#existing columns/cases
ncol(animal_weights)
#expected rows/cases
nrow(animal_weights) * (ncol(animal_weights)-1)
# expected columns
144
```
### Challenge: Pivot the Chosen Data
With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.
```{r}
#| tbl-cap: Pivoted Example
animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy", "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
names_to="Animal_type",
values_to = "Weight")
animal_weights
```
## Eggs
Read in one (or more) of the following datasets, using the correct R package and command.
- eggs_tidy.csv ⭐⭐
```{r}
eggs<-read_csv("_data/eggs_tidy.csv")
```
### Briefly describe the data
This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category.
### Anticipate the End Result
Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.
### Challenge: Describe the final dimensions
We expect a new dataframe to have $120*(6-2) = 480$ rows x $4$ columns.
```{r}
#| tbl-cap: Animal
eggs<-tibble(eggs)
eggs
#existing rows/cases
nrow(eggs)
#existing columns/cases
ncol(eggs)
#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)
# expected columns
480
```
### Challenge: Pivot the Chosen Data
With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.
```{r}
#| tbl-cap: Pivoted Example2
#Renaming the column names
eggs2<-rename(eggs,
"Large Half Dozen" = large_half_dozen,
"Large Dozen" = large_dozen,
"Extra Large Half Dozen"= extra_large_half_dozen,
"Extra Large Dozen" = extra_large_dozen )
eggs2%>%
pivot_longer(
cols = ends_with("Dozen"),
names_to = "Category",
values_to = "Cost100"
)
#I am then trying to divide the Cost column by 100 to get the average cost of egg cartons.
#eggs2<- mutate(eggs2,
# Cost = Cost100 / 100 )
#eggs2
```
:::