library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 5
Challenge Overview
Today’s challenge is to:
- read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
- tidy data (as needed, including sanity checks)
- mutate variables as needed (including sanity checks)
- create at least two univariate visualizations
- try to make them “publication” ready
- Explain why you choose the specific graph type
- Create at least one bivariate visualization
- try to make them “publication” ready
- Explain why you choose the specific graph type
R Graph Gallery is a good starting point for thinking about what information is conveyed in standard graph types, and includes example R code.
(be sure to only include the category tags for the data you use!)
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- cereal.csv ⭐
- Total_cost_for_top_15_pathogens_2018.xlsx ⭐
- Australian Marriage ⭐⭐
- AB_NYC_2019.csv ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐
- Public School Characteristics ⭐⭐⭐⭐
- USA Households ⭐⭐⭐⭐⭐
setwd(“C:/Github Projects/601_Spring_2023/posts/_data”)
marriage <- read.csv(“australian_marriage_tidy.csv”)
Even though this dataset is small and one could simply eye through the set to identify regions where people are predominately unmarried.
View(marriage)
Error in as.data.frame(x): object 'marriage' not found
%>%
marriage count(resp)
Error in count(., resp): object 'marriage' not found
<- marriage %>%
df1 select(resp, territory, percent) %>%
filter(resp == "no") %>%
filter(percent > 50)
Error in select(., resp, territory, percent): object 'marriage' not found
View(df1)
Error in as.data.frame(x): object 'df1' not found
By filtering only people who responded no at a rate of over 50%, it is clear every region has more married people than not. So, we will filter differently.
library(ggplot2) install.packages(“ggthemes”) library(ggthemes)
library(ggplot2)
install.packages("ggthemes")
Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(ggthemes)
= c(57.8, 64.9, 60.7, 62.5, 63.7, 63.6, 60.6, 74.0)
pvar order(pvar)
[1] 1 7 3 4 6 5 2 8
order(pvar)] pvar[
[1] 57.8 60.6 60.7 62.5 63.6 63.7 64.9 74.0
pvar
[1] 57.8 64.9 60.7 62.5 63.7 63.6 60.6 74.0
%>%
marriage filter (resp %in% c("yes")) %>%
ggplot(mapping = aes(x = percent,
y = territory,
labs(title = "Response Count by Territory",
y = "Territory",
x = "Percentage" )))+
theme_economist()+
scale_color_economist()+
geom_point(alpha = 0.5)
Error in filter(., resp %in% c("yes")): object 'marriage' not found
Australian Capital Territory can be seen as clearly having the greatest percent of the population married, at almost 75%. New South Wales is on the other side of the spectrum, at nearly 55%.
next we’ll attempt to look at the count with a density map.
%>%
climb filter(resp == "yes") %>%
ggplot(aes(count, territory, color=count))+
geom_point(size=3, alpha = 0.8)+
geom_smooth()+
theme_linedraw()+
labs(title="Count by Territory",
x = "Number of Married Individuals",
y = "Territory")
Error in filter(., resp == "yes"): object 'climb' not found
Why is it important that we do this? Because looking at the graph of percentage, Australian Capital Territory is vastly dominant in terms of how many individuals are married while New South Wales lags in the very back. However, when looking at count, we see that new South Wales has a much larger count and the Capital Territory has nearly the smallest count. Because of this, we can assume the greater the count, the more likely the percentage is to fall.
Below, we’ll take a closer look at the differences between the two.
<- marriage %>%
NSW.compare select(resp, count, percent, territory) %>%
arrange(territory) %>%
filter(territory %in% c("New South Wales", "Australian Capital Territory(c)"))
Error in select(., resp, count, percent, territory): object 'marriage' not found
%>%
NSW.compare ggplot(mapping = aes(x = count))+
geom_boxplot(fill="steelblue")+
theme_classic()+
labs(title = "Count by Territoy",
x = "percent")
Error in ggplot(., mapping = aes(x = count)): object 'NSW.compare' not found