library(tidyverse)
library(ggplot2)
library(plotly)
library(gapminder)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 5 Instructions
Challenge Overview
Today’s challenge is to:
- read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- create at least two univariate visualizations
- try to make them “publication” ready
- Explain why you choose the specific graph type
- Create at least one bivariate visualization
- try to make them “publication” ready
- Explain why you choose the specific graph type
Read in data
<- read_csv("_data/cereal.csv") cereal
Briefly describe the data
head(cereal)
# A tibble: 6 × 4
Cereal Sodium Sugar Type
<chr> <dbl> <dbl> <chr>
1 Frosted Mini Wheats 0 11 A
2 Raisin Bran 340 18 A
3 All Bran 70 5 A
4 Apple Jacks 140 14 C
5 Captain Crunch 200 12 C
6 Cheerios 180 1 C
dim(cereal)
[1] 20 4
colnames(cereal)
[1] "Cereal" "Sodium" "Sugar" "Type"
summary(cereal)
Cereal Sodium Sugar Type
Length:20 Min. : 0.0 Min. : 0.00 Length:20
Class :character 1st Qu.:137.5 1st Qu.: 4.00 Class :character
Mode :character Median :180.0 Median : 9.50 Mode :character
Mean :167.0 Mean : 8.75
3rd Qu.:202.5 3rd Qu.:12.50
Max. :340.0 Max. :18.00
cereal
# A tibble: 20 × 4
Cereal Sodium Sugar Type
<chr> <dbl> <dbl> <chr>
1 Frosted Mini Wheats 0 11 A
2 Raisin Bran 340 18 A
3 All Bran 70 5 A
4 Apple Jacks 140 14 C
5 Captain Crunch 200 12 C
6 Cheerios 180 1 C
7 Cinnamon Toast Crunch 210 10 C
8 Crackling Oat Bran 150 16 A
9 Fiber One 100 0 A
10 Frosted Flakes 130 12 C
11 Froot Loops 140 14 C
12 Honey Bunches of Oats 180 7 A
13 Honey Nut Cheerios 190 9 C
14 Life 160 6 C
15 Rice Krispies 290 3 C
16 Honey Smacks 50 15 A
17 Special K 220 4 A
18 Wheaties 180 4 A
19 Corn Flakes 200 3 A
20 Honeycomb 210 11 C
The cereal data set contains tabular-style data, with variables representing the cereal name, sodium level, sugar level, and type(A or C) for 20 brands of cereal.
Data arranged by Sodium level
<- cereal %>% arrange(cereal, Sodium)
arranged_sodium arranged_sodium
# A tibble: 20 × 4
Cereal Sodium Sugar Type
<chr> <dbl> <dbl> <chr>
1 All Bran 70 5 A
2 Apple Jacks 140 14 C
3 Captain Crunch 200 12 C
4 Cheerios 180 1 C
5 Cinnamon Toast Crunch 210 10 C
6 Corn Flakes 200 3 A
7 Crackling Oat Bran 150 16 A
8 Fiber One 100 0 A
9 Froot Loops 140 14 C
10 Frosted Flakes 130 12 C
11 Frosted Mini Wheats 0 11 A
12 Honey Bunches of Oats 180 7 A
13 Honey Nut Cheerios 190 9 C
14 Honey Smacks 50 15 A
15 Honeycomb 210 11 C
16 Life 160 6 C
17 Raisin Bran 340 18 A
18 Rice Krispies 290 3 C
19 Special K 220 4 A
20 Wheaties 180 4 A
Data arranged by Sugar level
<- cereal %>% arrange(cereal, Sugar)
arranged_sugar arranged_sugar
# A tibble: 20 × 4
Cereal Sodium Sugar Type
<chr> <dbl> <dbl> <chr>
1 All Bran 70 5 A
2 Apple Jacks 140 14 C
3 Captain Crunch 200 12 C
4 Cheerios 180 1 C
5 Cinnamon Toast Crunch 210 10 C
6 Corn Flakes 200 3 A
7 Crackling Oat Bran 150 16 A
8 Fiber One 100 0 A
9 Froot Loops 140 14 C
10 Frosted Flakes 130 12 C
11 Frosted Mini Wheats 0 11 A
12 Honey Bunches of Oats 180 7 A
13 Honey Nut Cheerios 190 9 C
14 Honey Smacks 50 15 A
15 Honeycomb 210 11 C
16 Life 160 6 C
17 Raisin Bran 340 18 A
18 Rice Krispies 290 3 C
19 Special K 220 4 A
20 Wheaties 180 4 A
Univariate Visualizations
Plan to show data arranged by Sodium level and data arranged by Sugar level
Bivariate Visualization(s)
Observing realtions between Sodium, Sugar, and Cereal(brand)
<- ggplot(data = cereal, mapping = aes(x = Sodium, y = Sugar)) +
brand_cereal geom_point(mapping = aes(color = Cereal)) +
geom_smooth()
I thought that this visualization would be interesting to make as it shows the relation between the sodium levels and sugar levels of different cereal brands. I would like to continue working on better visualizing the color distinctions between the cereal brands and to spread out the x and y increments.
Exploration
I found that if you use ggplotly(), you are able to build an interactive visualization. This can resolve the issue of determining which color corresponds to which color as when you hover over a point the information about a specific cereal is readily available.To make ggplotly() available, I installed packages plotly and gapminder.
ggplotly(brand_cereal)
<- ggplot(data = cereal, mapping = aes(x = Sodium, y = Sugar)) +
type_cereal geom_point(mapping = aes(color = Type)) +
geom_smooth()
ggplotly(type_cereal)
I thought that this visualization would be interesting to make as it shows the relation between the sodium levels and sugar levels of cereal types A and C. Such visualizations could be useful for example, if Type A represented hot cereal and Type C represented cold cereal, then an analysis of the Sodium and Sugar levels in hot and cold cereal could be analyzed.