Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Connor Landreth
March 9, 2023
Today’s challenge is to:
pivot_longer
month year large_half_dozen large_dozen extra_large_half_dozen
1 January 2004 126.000 230.000 132.000
2 February 2004 128.500 226.250 134.500
3 March 2004 131.000 225.000 137.000
4 April 2004 131.000 225.000 137.000
5 May 2004 131.000 225.000 137.000
6 June 2004 133.500 231.375 137.000
7 July 2004 133.500 233.500 137.000
8 August 2004 133.500 233.500 137.000
9 September 2004 129.750 233.500 135.875
10 October 2004 128.500 233.500 135.500
11 November 2004 128.500 233.500 135.500
12 December 2004 128.500 233.500 135.500
13 January 2005 128.500 233.500 135.500
14 February 2005 128.500 233.500 135.500
15 March 2005 128.500 233.500 135.500
16 April 2005 128.500 233.500 135.500
17 May 2005 128.500 233.500 135.500
18 June 2005 128.500 233.500 135.500
19 July 2005 128.500 233.500 135.500
20 August 2005 128.500 233.500 135.500
21 September 2005 128.500 233.500 135.500
22 October 2005 128.500 233.500 135.500
23 November 2005 128.500 233.500 135.500
24 December 2005 128.500 233.500 135.500
25 January 2006 128.500 233.500 135.500
26 February 2006 128.500 233.500 135.500
27 March 2006 128.500 233.500 135.500
28 April 2006 128.500 233.500 135.500
29 May 2006 128.500 233.500 135.500
30 June 2006 128.500 233.500 135.500
31 July 2006 128.500 233.500 135.500
32 August 2006 128.500 233.500 135.500
33 September 2006 128.500 233.500 135.500
34 October 2006 128.500 233.500 135.500
35 November 2006 128.500 233.500 135.500
36 December 2006 128.500 233.500 135.500
37 January 2007 128.500 233.500 135.500
38 February 2007 131.125 236.125 138.125
39 March 2007 132.000 237.000 139.000
40 April 2007 132.000 237.000 139.000
41 May 2007 132.000 237.000 139.000
42 June 2007 132.000 237.000 139.000
43 July 2007 132.000 237.000 139.000
44 August 2007 132.000 237.000 139.000
45 September 2007 132.000 237.000 139.000
46 October 2007 132.000 237.000 139.000
47 November 2007 132.000 237.000 139.000
48 December 2007 132.000 237.000 139.000
49 January 2008 132.000 237.000 139.000
50 February 2008 132.000 237.000 139.000
51 March 2008 132.000 237.000 139.000
52 April 2008 132.000 237.000 139.000
53 May 2008 132.000 237.000 139.000
54 June 2008 174.500 277.500 185.500
55 July 2008 174.500 277.500 185.500
56 August 2008 174.500 277.500 185.500
57 September 2008 174.500 277.500 185.500
58 October 2008 174.500 277.500 185.500
59 November 2008 174.500 277.500 185.500
60 December 2008 174.500 277.500 185.500
61 January 2009 174.500 277.500 185.500
62 February 2009 174.500 277.500 185.500
63 March 2009 174.500 277.500 185.500
64 April 2009 174.500 277.500 185.500
65 May 2009 174.500 277.500 185.500
66 June 2009 174.500 277.500 185.500
67 July 2009 174.500 277.500 185.500
68 August 2009 174.500 271.500 185.500
69 September 2009 174.500 271.500 185.500
70 October 2009 174.500 271.500 185.500
71 November 2009 174.500 271.500 185.500
72 December 2009 174.500 271.500 185.500
73 January 2010 174.500 271.500 185.500
74 February 2010 174.500 271.500 185.500
75 March 2010 174.500 268.000 185.500
76 April 2010 174.500 268.000 185.500
77 May 2010 174.500 268.000 185.500
78 June 2010 174.500 268.000 185.500
79 July 2010 174.500 268.000 185.500
80 August 2010 174.500 268.000 185.500
81 September 2010 174.500 268.000 185.500
82 October 2010 174.500 267.500 185.500
83 November 2010 174.500 267.500 185.500
84 December 2010 174.500 267.500 185.500
85 January 2011 174.500 267.500 185.500
86 February 2011 174.500 267.500 185.500
87 March 2011 174.500 267.500 185.500
88 April 2011 174.500 267.500 185.500
89 May 2011 174.500 267.500 185.500
90 June 2011 174.500 270.000 185.500
91 July 2011 174.500 270.000 185.500
92 August 2011 174.500 270.000 185.500
93 September 2011 174.500 270.000 185.500
94 October 2011 174.500 270.000 185.500
95 November 2011 174.500 270.000 185.500
96 December 2011 174.500 270.000 185.500
97 January 2012 174.500 267.500 185.500
98 February 2012 174.500 267.500 185.500
99 March 2012 174.500 267.500 185.500
100 April 2012 174.500 267.500 185.500
101 May 2012 173.250 267.500 185.500
102 June 2012 173.250 267.500 185.500
103 July 2012 173.250 267.500 185.500
104 August 2012 173.250 267.500 185.500
105 September 2012 173.250 267.500 185.500
106 October 2012 173.250 267.500 185.500
107 November 2012 178.000 267.500 188.130
108 December 2012 178.000 267.500 188.130
109 January 2013 178.000 267.500 188.130
110 February 2013 178.000 267.500 188.130
111 March 2013 178.000 267.500 188.130
112 April 2013 178.000 267.500 188.130
113 May 2013 178.000 267.500 188.130
114 June 2013 178.000 267.500 188.130
115 July 2013 178.000 267.500 188.130
116 August 2013 178.000 267.500 188.130
117 September 2013 178.000 267.500 188.130
118 October 2013 178.000 267.500 188.130
119 November 2013 178.000 267.500 188.130
120 December 2013 178.000 267.500 188.130
extra_large_dozen
1 230.000
2 230.000
3 230.000
4 234.500
5 236.000
6 241.000
7 241.000
8 241.000
9 241.000
10 241.000
11 241.000
12 241.000
13 241.000
14 241.000
15 241.000
16 241.000
17 241.000
18 241.000
19 241.000
20 241.000
21 241.000
22 241.000
23 241.000
24 241.000
25 241.000
26 241.000
27 241.375
28 241.500
29 241.500
30 241.500
31 241.500
32 241.500
33 241.500
34 241.500
35 241.500
36 241.500
37 241.500
38 244.125
39 245.000
40 245.000
41 245.000
42 245.000
43 245.000
44 245.000
45 245.000
46 245.000
47 245.000
48 245.000
49 245.000
50 245.000
51 245.000
52 245.000
53 245.000
54 285.500
55 285.500
56 285.500
57 285.500
58 285.500
59 285.500
60 285.500
61 285.500
62 285.500
63 285.500
64 285.500
65 285.500
66 285.500
67 285.500
68 285.500
69 285.500
70 285.500
71 285.500
72 285.500
73 285.500
74 285.500
75 285.500
76 285.500
77 285.500
78 285.500
79 285.500
80 285.500
81 285.500
82 285.500
83 285.500
84 285.500
85 285.500
86 285.500
87 285.500
88 285.500
89 285.500
90 285.500
91 285.500
92 285.500
93 285.500
94 285.500
95 285.500
96 285.500
97 285.500
98 288.500
99 288.500
100 288.500
101 288.500
102 288.500
103 288.500
104 288.500
105 288.500
106 288.500
107 290.000
108 290.000
109 290.000
110 290.000
111 290.000
112 290.000
113 290.000
114 290.000
115 290.000
116 290.000
117 290.000
118 290.000
119 290.000
120 290.000
The data is inclusive of 120 observations and 6 variables. The data includes data on the prices of egg cartons from January 2004 to December 2013, hence the 120 observations. Pivoting this will make it more tidy because we can see more clearly how egg quantities increased or decreased from year-to-year, as opposed to every month back-to-back.
Right now we are working with six columns and 120 rows when evaluating this data set. When pivoted for cleanliness, each carton type should appear together for each year (eg Nov 2006 large dozen and Nov 2006 large half dozen will be together)
Current: 6 columns x 120 rows. We will pivot this data longer to create a data set with only 4 columns, year and month will remain, and two new will be created (type of carton and relative price(P/lb))
Lets see if this works with a simple example.
# A tibble: 480 × 4
month year Type_of_Carton price_per_pound
<chr> <int> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 February 2004 large_half_dozen 128.
6 February 2004 large_dozen 226.
7 February 2004 extra_large_half_dozen 134.
8 February 2004 extra_large_dozen 230
9 March 2004 large_half_dozen 131
10 March 2004 large_dozen 225
# … with 470 more rows
The new dimesions are 4 columns and 480 rows
[1] 480
[1] 4
# A tibble: 20 × 6
# Groups: Type_of_Carton [4]
Type_of_Carton Min Max Mean Median Quantile
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 extra_large_dozen 230 290 267. 286. 230
2 extra_large_dozen 230 290 267. 286. 242.
3 extra_large_dozen 230 290 267. 286. 286.
4 extra_large_dozen 230 290 267. 286. 286.
5 extra_large_dozen 230 290 267. 286. 290
6 extra_large_half_dozen 132 188. 164. 186. 132
7 extra_large_half_dozen 132 188. 164. 186. 136.
8 extra_large_half_dozen 132 188. 164. 186. 186.
9 extra_large_half_dozen 132 188. 164. 186. 186.
10 extra_large_half_dozen 132 188. 164. 186. 188.
11 large_dozen 225 278. 254. 268. 225
12 large_dozen 225 278. 254. 268. 234.
13 large_dozen 225 278. 254. 268. 268.
14 large_dozen 225 278. 254. 268. 268
15 large_dozen 225 278. 254. 268. 278.
16 large_half_dozen 126 178 155. 174. 126
17 large_half_dozen 126 178 155. 174. 129.
18 large_half_dozen 126 178 155. 174. 174.
19 large_half_dozen 126 178 155. 174. 174.
20 large_half_dozen 126 178 155. 174. 178
Unsurprisingly, the mean price for a carton of extra large eggs came out to be the most expensive on average, while the mean price of a half dozen large eggs came out to be the lowest, on average.
I don’t know if this actually shows much but it shows price tends to increase in certain months. Any additional comments?
---
title: "Challenge 3 - Eggs"
author: "Connor Landreth"
description: "Tidy Data: Pivoting"
date: "03/09/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- eggs
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
```{r}
setwd("/Github Projects/601_Spring_2023/posts/_data")
eggs <- read.csv("eggs_tidy.csv")
eggs
```
### Briefly describe the data
The data is inclusive of 120 observations and 6 variables. The data includes data on the prices of egg cartons from January 2004 to December 2013, hence the 120 observations. Pivoting this will make it more tidy because we can see more clearly how egg quantities increased or decreased from year-to-year, as opposed to every month back-to-back.
## Anticipate the End Result
Right now we are working with six columns and 120 rows when evaluating this data set. When pivoted for cleanliness, each carton type should appear together for each year (eg Nov 2006 large dozen and Nov 2006 large half dozen will be together)
### Example: find current and future data dimensions
Current: 6 columns x 120 rows. We will pivot this data longer to create a data set with only 4 columns, year and month will remain, and two new will be created (type of carton and relative price(P/lb))
```{r}
nrow(eggs)
ncol(eggs)
```
Lets see if this works with a simple example.
```{r}
eggs_pivotlong <- eggs %>%
pivot_longer(col = c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen),
names_to = "Type_of_Carton",
values_to = "price_per_pound")
eggs_pivotlong
View(eggs_pivotlong)
```
### Challenge: Describe the final dimensions
The new dimesions are 4 columns and 480 rows
```{r}
nrow(eggs_pivotlong)
ncol(eggs_pivotlong)
eggs_pivotlong %>%
group_by(Type_of_Carton) %>%
summarise( Min = min(price_per_pound),
Max = max(price_per_pound),
Mean = mean(price_per_pound),
Median = median(price_per_pound),
Quantile = quantile(price_per_pound))
```
Unsurprisingly, the mean price for a carton of extra large eggs came out to be the most expensive on average, while the mean price of a half dozen large eggs came out to be the lowest, on average.
```{r}
eggs_pivotlong %>%
select(Type_of_Carton, price_per_pound, month, year) %>%
ggplot(aes(price_per_pound, month, color=Type_of_Carton))+
geom_point(size=3, alpha = 0.8)+
geom_smooth()+
theme_linedraw()+
labs(title="Price by Carton Type over Months")
```
I don't know if this actually shows much but it shows price tends to increase in certain months. Any additional comments?