Challenge 5

challenge_5

cereal

Introduction to Visualization

Author

Rishita Golla

Published

January 12, 2022

 #| label: setup
 #| warning: false
 #| message: false

 library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

 library(ggplot2)

 knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

 data <- read_csv("_data/cereal.csv")
 view(data)

Briefly describe the data

The data set contains 20 types of cereals and the amount of sodium and sugar in the cereals. It also contains another column named Type. This field has two values - A and C under it. After close inspection it looks like A stands for Adults and C for children (based on consumption).

Tidy Data (as needed)

The data is tidy. However, usage of A and C under ‘Type’ column is not very intuitive. Hence I will rename these to adults and children.

 cereal <- mutate(data, `Type` = recode(`Type`,
     "A" = "Adults",
     "C" = "Children"
 ))

Univariate Visualizations

I chose to plot 3 histograms. The first plot shows the sodium count in different types of cereals. The next two graphs display a density plot between Sodium content/ Sugar content with different types of cereals. The below plots however excludes outliers in the data.

 ggplot(cereal, aes(x=Sodium)) +
 geom_histogram(binwidth = 10)

 ggplot(cereal, aes(Sodium)) + 
   geom_histogram(aes(y = ..density..), colour = 1, fill = "white") +
   geom_density(lwd = 1, colour = 4,
               fill = 4, alpha = 0.25)

   labs(title = "Sodium Content by Cereal Brand", x = "Sodium")

$x
[1] "Sodium"

$title
[1] "Sodium Content by Cereal Brand"

attr(,"class")
[1] "labels"

 ggplot(cereal, aes(Sugar)) + 
   geom_histogram(aes(y = ..density..), colour = 1, fill = "white") +
   geom_density(lwd = 1, colour = 4,
               fill = 4, alpha = 0.25)

   labs(title = "Sugar Content by Cereal Brand", x = "Sugar")

$x
[1] "Sugar"

$title
[1] "Sugar Content by Cereal Brand"

attr(,"class")
[1] "labels"

Bivariate Visualization(s)

Below are two scatter plots to display the Sodium vs Sugar content of different types of cereals. We see that one cereal (Raisin Bran) has high content of both Sodium and Sugar.

 ggplot(cereal, aes(x=Sugar, y=Sodium)) +
   geom_point()

 ggplot(cereal,aes(x=Sugar,y=Sodium,col=Type))+
   geom_point()

From the above plots we see that there is no apparent/direct relationship between Sugar/Sodium content and the cereals based on type (Adult/Children).