Challenge 5

challenge_5
railroads
cereal
air_bnb
pathogen_cost
australian_marriage
public_schools
usa_households
Introduction to Visualization
Author

Janani Natarajan

Published

May 8, 2023

library(tidyverse)
library(ggplot2)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • cereal.csv ⭐
  • Total_cost_for_top_15_pathogens_2018.xlsx ⭐
  • Australian Marriage ⭐⭐
  • AB_NYC_2019.csv ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐
  • Public School Characteristics ⭐⭐⭐⭐
  • USA Households ⭐⭐⭐⭐⭐
#simply read in the data (untouched)
library(readr)
NYCHousing <- read_csv("_data/AB_NYC_2019.csv")
NYCHousing
# A tibble: 48,895 × 16
      id name      host_id host_…¹ neigh…² neigh…³ latit…⁴ longi…⁵ room_…⁶ price
   <dbl> <chr>       <dbl> <chr>   <chr>   <chr>     <dbl>   <dbl> <chr>   <dbl>
 1  2539 Clean & …    2787 John    Brookl… Kensin…    40.6   -74.0 Privat…   149
 2  2595 Skylit M…    2845 Jennif… Manhat… Midtown    40.8   -74.0 Entire…   225
 3  3647 THE VILL…    4632 Elisab… Manhat… Harlem     40.8   -73.9 Privat…   150
 4  3831 Cozy Ent…    4869 LisaRo… Brookl… Clinto…    40.7   -74.0 Entire…    89
 5  5022 Entire A…    7192 Laura   Manhat… East H…    40.8   -73.9 Entire…    80
 6  5099 Large Co…    7322 Chris   Manhat… Murray…    40.7   -74.0 Entire…   200
 7  5121 BlissArt…    7356 Garon   Brookl… Bedfor…    40.7   -74.0 Privat…    60
 8  5178 Large Fu…    8967 Shunic… Manhat… Hell's…    40.8   -74.0 Privat…    79
 9  5203 Cozy Cle…    7490 MaryEl… Manhat… Upper …    40.8   -74.0 Privat…    79
10  5238 Cute & C…    7549 Ben     Manhat… Chinat…    40.7   -74.0 Entire…   150
# … with 48,885 more rows, 6 more variables: minimum_nights <dbl>,
#   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
#   calculated_host_listings_count <dbl>, availability_365 <dbl>, and
#   abbreviated variable names ¹​host_name, ²​neighbourhood_group,
#   ³​neighbourhood, ⁴​latitude, ⁵​longitude, ⁶​room_type

Briefly describe the data

The listing activities of Airbnb homes in New York City’s five boroughs in 2019 are described in this data collection. Geographical coordinates, rental type (whole home/apartment, private room, or shared room), pricing breakdowns, reviews (most recent review, total number of reviews, and reviews per month), and the number of days available in 2019 are all included in the property information. There are 48,895 observations in total (each one representing a listing). I won’t need to change any variables in this situation because the data is already organized.

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

I don’t believe there are any variables we’ll want to add or update because the data already appears to be very organized. We could want to investigate some variables with outliers that we need to take out. Later on in the analysis, we’ll discuss this again.

Univariate Visualizations

ggplot(Airbnb_NYC, aes(neighbourhood_group)) + geom_bar(fill = "blue") + labs(title = "Number of Airbnb Units in each NYC Borough", x = "BOROUGH", y = "NUMBER OF UNITS") + 
  theme_bw()
Error in ggplot(Airbnb_NYC, aes(neighbourhood_group)): object 'Airbnb_NYC' not found
ggplot(Airbnb_NYC, aes(x = room_type)) + geom_bar(fill = "yellow") + labs(title = "Number of Airbnb Units by Room Type", x = "ROOM TYPE", y = "NUMBER OF UNITS") + 
  theme_bw()
Error in ggplot(Airbnb_NYC, aes(x = room_type)): object 'Airbnb_NYC' not found

Bivariate Visualization(s)

ggplot(Airbnb_NYC, aes(neighbourhood_group, fill = room_type)) + 
  geom_bar(stat = "count") + 
  labs(title = "Number of Airbnb Units in each NYC Borough", x = "Borough", y = "Number of Units") +
  scale_fill_discrete(name = "Room Type") + 
  theme_bw()
Error in ggplot(Airbnb_NYC, aes(neighbourhood_group, fill = room_type)): object 'Airbnb_NYC' not found
listingperprice_NYCHousing <- NYCHousing %>%
  filter(price>0 & price<2500)
listingperprice_NYCHousing %>%
  ggplot(aes(neighbourhood_group,price))+
  geom_boxplot()

Airbnb_NYC %>% 
  summarize("mean" = mean(price, na.rm = TRUE), 
            "standard_deviation" = sd(price, na.rm = TRUE),
            "lowest" = min(price, na.rm = TRUE),
            "25th quantile" = quantile(price, probs = .25), 
            "median" = median(price, na.rm = TRUE), 
            "75th quantile" = quantile(price, probs = .75),
            "99 quantile" = quantile(price, .99),
            "highest" = max(price, na.rm = TRUE))
Error in summarize(., mean = mean(price, na.rm = TRUE), standard_deviation = sd(price, : object 'Airbnb_NYC' not found
Airbnb_NYC %>% 
  filter(room_type == "Entire home/apt") %>% 
  group_by(neighbourhood_group) %>% 
  summarise(mean = mean(price)) %>% 
  ggplot(aes(neighbourhood_group,mean,)) +
  geom_col(fill = "purple") +
  labs(title = "Average Price of Airbnb Units in NYC Borough", x = "Borough", y = "Average Price") + 
  theme_bw()
Error in filter(., room_type == "Entire home/apt"): object 'Airbnb_NYC' not found
Airbnb_NYC <- filter(Airbnb_NYC, price < 1000)
Error in filter(Airbnb_NYC, price < 1000): object 'Airbnb_NYC' not found
ggplot(Airbnb_NYC, aes(x = price)) + geom_histogram(fill = "pink") +
  labs(title = "Price of Airbnb Units in NYC", x = "Price of Unit", y = "Number") + 
  theme_bw()
Error in ggplot(Airbnb_NYC, aes(x = price)): object 'Airbnb_NYC' not found
Airbnb_NYC_rm %>% filter(room_type == "Entire home/apt") %>% 
ggplot(aes(x = price)) + geom_histogram(fill = "green") +
  labs(title = "Price of Entire Apartment type Airbnb Units in NYC", x = "Price of Unit", y = "Count") + 
  theme_bw()
Error in filter(., room_type == "Entire home/apt"): object 'Airbnb_NYC_rm' not found
Airbnb_NYC_rm %>% 
  filter(neighbourhood_group == "Brooklyn") %>% 
  ggplot(aes(x = price, y = number_of_reviews, color = room_type)) +
  geom_point() +
  labs(title = "Price of Airbnb Units in Brooklyn", 
       x = "Price of Unit", y = "Number of Reviews") +
  scale_colour_discrete("Room Type") +
  theme_bw()
Error in filter(., neighbourhood_group == "Brooklyn"): object 'Airbnb_NYC_rm' not found