Challenge 3 Submission

challenge_3
eggs
Cam Needels
Tidy Data: Pivoting Featuring Eggs
Author

Cam Needels

Published

August 17, 2022

Code
library(tidyverse)
library(summarytools)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
Code
library(readr)
eggs_data <- read.csv("B:/Needels/Documents/DACCS 601/601_Spring_2023/posts/_data/eggs_tidy.csv")
eggs_data
        month year large_half_dozen large_dozen extra_large_half_dozen
1     January 2004          126.000     230.000                132.000
2    February 2004          128.500     226.250                134.500
3       March 2004          131.000     225.000                137.000
4       April 2004          131.000     225.000                137.000
5         May 2004          131.000     225.000                137.000
6        June 2004          133.500     231.375                137.000
7        July 2004          133.500     233.500                137.000
8      August 2004          133.500     233.500                137.000
9   September 2004          129.750     233.500                135.875
10    October 2004          128.500     233.500                135.500
11   November 2004          128.500     233.500                135.500
12   December 2004          128.500     233.500                135.500
13    January 2005          128.500     233.500                135.500
14   February 2005          128.500     233.500                135.500
15      March 2005          128.500     233.500                135.500
16      April 2005          128.500     233.500                135.500
17        May 2005          128.500     233.500                135.500
18       June 2005          128.500     233.500                135.500
19       July 2005          128.500     233.500                135.500
20     August 2005          128.500     233.500                135.500
21  September 2005          128.500     233.500                135.500
22    October 2005          128.500     233.500                135.500
23   November 2005          128.500     233.500                135.500
24   December 2005          128.500     233.500                135.500
25    January 2006          128.500     233.500                135.500
26   February 2006          128.500     233.500                135.500
27      March 2006          128.500     233.500                135.500
28      April 2006          128.500     233.500                135.500
29        May 2006          128.500     233.500                135.500
30       June 2006          128.500     233.500                135.500
31       July 2006          128.500     233.500                135.500
32     August 2006          128.500     233.500                135.500
33  September 2006          128.500     233.500                135.500
34    October 2006          128.500     233.500                135.500
35   November 2006          128.500     233.500                135.500
36   December 2006          128.500     233.500                135.500
37    January 2007          128.500     233.500                135.500
38   February 2007          131.125     236.125                138.125
39      March 2007          132.000     237.000                139.000
40      April 2007          132.000     237.000                139.000
41        May 2007          132.000     237.000                139.000
42       June 2007          132.000     237.000                139.000
43       July 2007          132.000     237.000                139.000
44     August 2007          132.000     237.000                139.000
45  September 2007          132.000     237.000                139.000
46    October 2007          132.000     237.000                139.000
47   November 2007          132.000     237.000                139.000
48   December 2007          132.000     237.000                139.000
49    January 2008          132.000     237.000                139.000
50   February 2008          132.000     237.000                139.000
51      March 2008          132.000     237.000                139.000
52      April 2008          132.000     237.000                139.000
53        May 2008          132.000     237.000                139.000
54       June 2008          174.500     277.500                185.500
55       July 2008          174.500     277.500                185.500
56     August 2008          174.500     277.500                185.500
57  September 2008          174.500     277.500                185.500
58    October 2008          174.500     277.500                185.500
59   November 2008          174.500     277.500                185.500
60   December 2008          174.500     277.500                185.500
61    January 2009          174.500     277.500                185.500
62   February 2009          174.500     277.500                185.500
63      March 2009          174.500     277.500                185.500
64      April 2009          174.500     277.500                185.500
65        May 2009          174.500     277.500                185.500
66       June 2009          174.500     277.500                185.500
67       July 2009          174.500     277.500                185.500
68     August 2009          174.500     271.500                185.500
69  September 2009          174.500     271.500                185.500
70    October 2009          174.500     271.500                185.500
71   November 2009          174.500     271.500                185.500
72   December 2009          174.500     271.500                185.500
73    January 2010          174.500     271.500                185.500
74   February 2010          174.500     271.500                185.500
75      March 2010          174.500     268.000                185.500
76      April 2010          174.500     268.000                185.500
77        May 2010          174.500     268.000                185.500
78       June 2010          174.500     268.000                185.500
79       July 2010          174.500     268.000                185.500
80     August 2010          174.500     268.000                185.500
81  September 2010          174.500     268.000                185.500
82    October 2010          174.500     267.500                185.500
83   November 2010          174.500     267.500                185.500
84   December 2010          174.500     267.500                185.500
85    January 2011          174.500     267.500                185.500
86   February 2011          174.500     267.500                185.500
87      March 2011          174.500     267.500                185.500
88      April 2011          174.500     267.500                185.500
89        May 2011          174.500     267.500                185.500
90       June 2011          174.500     270.000                185.500
91       July 2011          174.500     270.000                185.500
92     August 2011          174.500     270.000                185.500
93  September 2011          174.500     270.000                185.500
94    October 2011          174.500     270.000                185.500
95   November 2011          174.500     270.000                185.500
96   December 2011          174.500     270.000                185.500
97    January 2012          174.500     267.500                185.500
98   February 2012          174.500     267.500                185.500
99      March 2012          174.500     267.500                185.500
100     April 2012          174.500     267.500                185.500
101       May 2012          173.250     267.500                185.500
102      June 2012          173.250     267.500                185.500
103      July 2012          173.250     267.500                185.500
104    August 2012          173.250     267.500                185.500
105 September 2012          173.250     267.500                185.500
106   October 2012          173.250     267.500                185.500
107  November 2012          178.000     267.500                188.130
108  December 2012          178.000     267.500                188.130
109   January 2013          178.000     267.500                188.130
110  February 2013          178.000     267.500                188.130
111     March 2013          178.000     267.500                188.130
112     April 2013          178.000     267.500                188.130
113       May 2013          178.000     267.500                188.130
114      June 2013          178.000     267.500                188.130
115      July 2013          178.000     267.500                188.130
116    August 2013          178.000     267.500                188.130
117 September 2013          178.000     267.500                188.130
118   October 2013          178.000     267.500                188.130
119  November 2013          178.000     267.500                188.130
120  December 2013          178.000     267.500                188.130
    extra_large_dozen
1             230.000
2             230.000
3             230.000
4             234.500
5             236.000
6             241.000
7             241.000
8             241.000
9             241.000
10            241.000
11            241.000
12            241.000
13            241.000
14            241.000
15            241.000
16            241.000
17            241.000
18            241.000
19            241.000
20            241.000
21            241.000
22            241.000
23            241.000
24            241.000
25            241.000
26            241.000
27            241.375
28            241.500
29            241.500
30            241.500
31            241.500
32            241.500
33            241.500
34            241.500
35            241.500
36            241.500
37            241.500
38            244.125
39            245.000
40            245.000
41            245.000
42            245.000
43            245.000
44            245.000
45            245.000
46            245.000
47            245.000
48            245.000
49            245.000
50            245.000
51            245.000
52            245.000
53            245.000
54            285.500
55            285.500
56            285.500
57            285.500
58            285.500
59            285.500
60            285.500
61            285.500
62            285.500
63            285.500
64            285.500
65            285.500
66            285.500
67            285.500
68            285.500
69            285.500
70            285.500
71            285.500
72            285.500
73            285.500
74            285.500
75            285.500
76            285.500
77            285.500
78            285.500
79            285.500
80            285.500
81            285.500
82            285.500
83            285.500
84            285.500
85            285.500
86            285.500
87            285.500
88            285.500
89            285.500
90            285.500
91            285.500
92            285.500
93            285.500
94            285.500
95            285.500
96            285.500
97            285.500
98            288.500
99            288.500
100           288.500
101           288.500
102           288.500
103           288.500
104           288.500
105           288.500
106           288.500
107           290.000
108           290.000
109           290.000
110           290.000
111           290.000
112           290.000
113           290.000
114           290.000
115           290.000
116           290.000
117           290.000
118           290.000
119           290.000
120           290.000

Describing the Data

This data is about the number of eggs coming from various dozen sizes. The dates vary from January 2004 until December 2013. This data looks solid however the extra large dozen column is all by itself and it isn’t very tidy. In order to make it tidy I decided to make it long by having each category of dozen be a different data point so that extra large dozen can fit in.

Data prediction

Code
#existing rows/cases
nrow(eggs_data)
[1] 120
Code
#existing columns/cases
ncol(eggs_data)
[1] 6
Code
#expected rows/cases
nrow(eggs_data) * (ncol(eggs_data)-2)
[1] 480
Code
# expected columns 
2+2
[1] 4

With 120 rows in the current data set and 6 columns. 2 columns describe the case of the data set which include the month and year. The remainder of the 4 describe the carton and size of eggs.

After creating the carton_type column and the price_per_pound column there are 4 columns total. We should now expect to see 480 columns total because the old 4 columns merged into new rows multiplying the 120 by four times.

Pivot the Data

Code
eggs_data <-pivot_longer(eggs_data, col = c(large_half_dozen, large_dozen, extra_large_half_dozen, extra_large_dozen),
                         names_to= "carton_size",
                         values_to= "price_per_pound")
eggs_data
# A tibble: 480 × 4
   month     year carton_size            price_per_pound
   <chr>    <int> <chr>                            <dbl>
 1 January   2004 large_half_dozen                  126 
 2 January   2004 large_dozen                       230 
 3 January   2004 extra_large_half_dozen            132 
 4 January   2004 extra_large_dozen                 230 
 5 February  2004 large_half_dozen                  128.
 6 February  2004 large_dozen                       226.
 7 February  2004 extra_large_half_dozen            134.
 8 February  2004 extra_large_dozen                 230 
 9 March     2004 large_half_dozen                  131 
10 March     2004 large_dozen                       225 
# … with 470 more rows

The pivoted data looks great and has the expected dimensions of 480 rows x 4 columns!