Challenge 5 Instructions

challenge_5
cereal
ggplotly()
Introduction to Visualization
Author

Roy Yoon

Published

August 22, 2022

library(tidyverse)
library(ggplot2)
library(plotly)
library(gapminder)
library(readxl)


knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. mutate variables as needed (including sanity checks)
  4. create at least two univariate visualizations
  • try to make them “publication” ready
  • Explain why you choose the specific graph type
  1. Create at least one bivariate visualization
  • try to make them “publication” ready
  • Explain why you choose the specific graph type

Read in data

cereal <- read_csv("_data/cereal.csv")

Briefly describe the data

head(cereal)
# A tibble: 6 × 4
  Cereal              Sodium Sugar Type 
  <chr>                <dbl> <dbl> <chr>
1 Frosted Mini Wheats      0    11 A    
2 Raisin Bran            340    18 A    
3 All Bran                70     5 A    
4 Apple Jacks            140    14 C    
5 Captain Crunch         200    12 C    
6 Cheerios               180     1 C    
dim(cereal)
[1] 20  4
colnames(cereal)
[1] "Cereal" "Sodium" "Sugar"  "Type"  
summary(cereal)
    Cereal              Sodium          Sugar           Type          
 Length:20          Min.   :  0.0   Min.   : 0.00   Length:20         
 Class :character   1st Qu.:137.5   1st Qu.: 4.00   Class :character  
 Mode  :character   Median :180.0   Median : 9.50   Mode  :character  
                    Mean   :167.0   Mean   : 8.75                     
                    3rd Qu.:202.5   3rd Qu.:12.50                     
                    Max.   :340.0   Max.   :18.00                     
cereal
# A tibble: 20 × 4
   Cereal                Sodium Sugar Type 
   <chr>                  <dbl> <dbl> <chr>
 1 Frosted Mini Wheats        0    11 A    
 2 Raisin Bran              340    18 A    
 3 All Bran                  70     5 A    
 4 Apple Jacks              140    14 C    
 5 Captain Crunch           200    12 C    
 6 Cheerios                 180     1 C    
 7 Cinnamon Toast Crunch    210    10 C    
 8 Crackling Oat Bran       150    16 A    
 9 Fiber One                100     0 A    
10 Frosted Flakes           130    12 C    
11 Froot Loops              140    14 C    
12 Honey Bunches of Oats    180     7 A    
13 Honey Nut Cheerios       190     9 C    
14 Life                     160     6 C    
15 Rice Krispies            290     3 C    
16 Honey Smacks              50    15 A    
17 Special K                220     4 A    
18 Wheaties                 180     4 A    
19 Corn Flakes              200     3 A    
20 Honeycomb                210    11 C    

The cereal data set contains tabular-style data, with variables representing the cereal name, sodium level, sugar level, and type(A or C) for 20 brands of cereal.

Data arranged by Sodium level

arranged_sodium <- cereal %>% arrange(cereal, Sodium)
arranged_sodium
# A tibble: 20 × 4
   Cereal                Sodium Sugar Type 
   <chr>                  <dbl> <dbl> <chr>
 1 All Bran                  70     5 A    
 2 Apple Jacks              140    14 C    
 3 Captain Crunch           200    12 C    
 4 Cheerios                 180     1 C    
 5 Cinnamon Toast Crunch    210    10 C    
 6 Corn Flakes              200     3 A    
 7 Crackling Oat Bran       150    16 A    
 8 Fiber One                100     0 A    
 9 Froot Loops              140    14 C    
10 Frosted Flakes           130    12 C    
11 Frosted Mini Wheats        0    11 A    
12 Honey Bunches of Oats    180     7 A    
13 Honey Nut Cheerios       190     9 C    
14 Honey Smacks              50    15 A    
15 Honeycomb                210    11 C    
16 Life                     160     6 C    
17 Raisin Bran              340    18 A    
18 Rice Krispies            290     3 C    
19 Special K                220     4 A    
20 Wheaties                 180     4 A    

Data arranged by Sugar level

arranged_sugar <- cereal %>% arrange(cereal, Sugar)
arranged_sugar 
# A tibble: 20 × 4
   Cereal                Sodium Sugar Type 
   <chr>                  <dbl> <dbl> <chr>
 1 All Bran                  70     5 A    
 2 Apple Jacks              140    14 C    
 3 Captain Crunch           200    12 C    
 4 Cheerios                 180     1 C    
 5 Cinnamon Toast Crunch    210    10 C    
 6 Corn Flakes              200     3 A    
 7 Crackling Oat Bran       150    16 A    
 8 Fiber One                100     0 A    
 9 Froot Loops              140    14 C    
10 Frosted Flakes           130    12 C    
11 Frosted Mini Wheats        0    11 A    
12 Honey Bunches of Oats    180     7 A    
13 Honey Nut Cheerios       190     9 C    
14 Honey Smacks              50    15 A    
15 Honeycomb                210    11 C    
16 Life                     160     6 C    
17 Raisin Bran              340    18 A    
18 Rice Krispies            290     3 C    
19 Special K                220     4 A    
20 Wheaties                 180     4 A    

Univariate Visualizations

Plan to show data arranged by Sodium level and data arranged by Sugar level

Bivariate Visualization(s)

Observing realtions between Sodium, Sugar, and Cereal(brand)

brand_cereal <- ggplot(data = cereal, mapping = aes(x = Sodium, y = Sugar)) + 
  geom_point(mapping = aes(color = Cereal)) + 
  geom_smooth()

I thought that this visualization would be interesting to make as it shows the relation between the sodium levels and sugar levels of different cereal brands. I would like to continue working on better visualizing the color distinctions between the cereal brands and to spread out the x and y increments.

Exploration

I found that if you use ggplotly(), you are able to build an interactive visualization. This can resolve the issue of determining which color corresponds to which color as when you hover over a point the information about a specific cereal is readily available.To make ggplotly() available, I installed packages plotly and gapminder.

ggplotly(brand_cereal)
type_cereal <- ggplot(data = cereal, mapping = aes(x = Sodium, y = Sugar)) + 
  geom_point(mapping = aes(color = Type)) + 
  geom_smooth() 

  ggplotly(type_cereal)

I thought that this visualization would be interesting to make as it shows the relation between the sodium levels and sugar levels of cereal types A and C. Such visualizations could be useful for example, if Type A represented hot cereal and Type C represented cold cereal, then an analysis of the Sodium and Sugar levels in hot and cold cereal could be analyzed.