# A tibble: 120 × 5
date xldozen xlhalf_dozen large_dozen large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126
2 February 230 134. 226. 128.
3 March 230 137 225 131
4 April 234. 137 225 131
5 May 236 137 225 131
6 June 241 137 231. 134.
7 July 241 137 234. 134.
8 August 241 137 234. 134.
9 September 241 136. 234. 130.
10 October 241 136. 234. 128.
# … with 110 more rows
Above I read in the data set as an excel file. I had to look at the solutions sheet to learn how to read in the data. The only trouble I had was figuring out the “range” parameter of the function. That one I just had to copy in because I was getting an error without it, but I understand that this tells R which cells to read.
eggs %>%select("date") %>%distinct()
# A tibble: 22 × 1
date
<chr>
1 Jan 2004
2 February
3 March
4 April
5 May
6 June
7 July
8 August
9 September
10 October
# … with 12 more rows
Now we still have the issue of notes in the names of our months. This is most evident with “Jan…”, but the ” /1” in February needs to go. This needs to be removed to make analysis later on a bit easier to look at.
# A tibble: 120 × 6
month year xldozen xlhalf_dozen large_dozen large_half_dozen
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 Jan 2004 230 132 230 126
2 February 2004 230 134. 226. 128.
3 March 2004 230 137 225 131
4 April 2004 234. 137 225 131
5 May 2004 236 137 225 131
6 June 2004 241 137 231. 134.
7 July 2004 241 137 234. 134.
8 August 2004 241 137 234. 134.
9 September 2004 241 136. 234. 130.
10 October 2004 241 136. 234. 128.
# … with 110 more rows
You can see as the years progress, the price of eggs increase, though at what rate I am uncertain. Now the data set is read for some pivoting. Now let me do a sanity check to view the data dimensions before the pivot and then I will look at it after the pivot.
dim(eggs)
[1] 120 6
There are four columns with the type of eggs this data set is viewing. If we want even tidier data we can collapse these four columns into one. Lets call this new data set “eggstidy” to represent the final form of the data in this challenge.
# A tibble: 480 × 4
month year egg_type price
<chr> <int> <chr> <dbl>
1 Jan 2004 xldozen 230
2 Jan 2004 xlhalf_dozen 132
3 Jan 2004 large_dozen 230
4 Jan 2004 large_half_dozen 126
5 February 2004 xldozen 230
6 February 2004 xlhalf_dozen 134.
7 February 2004 large_dozen 226.
8 February 2004 large_half_dozen 128.
9 March 2004 xldozen 230
10 March 2004 xlhalf_dozen 137
# … with 470 more rows
By pivoting the data, each row has one observation, making it tidy for future manipulation. Within each row we can look at the particular variables within that observation with ease.
Below the dimensions have clearly changed, adding significantly more rows, but also condensing the amount of columns.