library(tidyverse)
library(ggplot2)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Challenge 5
Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- cereal.csv ⭐
- Total_cost_for_top_15_pathogens_2018.xlsx ⭐
- Australian Marriage ⭐⭐
- AB_NYC_2019.csv ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐
- Public School Characteristics ⭐⭐⭐⭐
- USA Households ⭐⭐⭐⭐⭐
#simply read in the data (untouched)
library(readr)
<- read_csv("_data/AB_NYC_2019.csv")
NYCHousing NYCHousing
# A tibble: 48,895 × 16
id name host_id host_…¹ neigh…² neigh…³ latit…⁴ longi…⁵ room_…⁶ price
<dbl> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl>
1 2539 Clean & … 2787 John Brookl… Kensin… 40.6 -74.0 Privat… 149
2 2595 Skylit M… 2845 Jennif… Manhat… Midtown 40.8 -74.0 Entire… 225
3 3647 THE VILL… 4632 Elisab… Manhat… Harlem 40.8 -73.9 Privat… 150
4 3831 Cozy Ent… 4869 LisaRo… Brookl… Clinto… 40.7 -74.0 Entire… 89
5 5022 Entire A… 7192 Laura Manhat… East H… 40.8 -73.9 Entire… 80
6 5099 Large Co… 7322 Chris Manhat… Murray… 40.7 -74.0 Entire… 200
7 5121 BlissArt… 7356 Garon Brookl… Bedfor… 40.7 -74.0 Privat… 60
8 5178 Large Fu… 8967 Shunic… Manhat… Hell's… 40.8 -74.0 Privat… 79
9 5203 Cozy Cle… 7490 MaryEl… Manhat… Upper … 40.8 -74.0 Privat… 79
10 5238 Cute & C… 7549 Ben Manhat… Chinat… 40.7 -74.0 Entire… 150
# … with 48,885 more rows, 6 more variables: minimum_nights <dbl>,
# number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
# calculated_host_listings_count <dbl>, availability_365 <dbl>, and
# abbreviated variable names ¹host_name, ²neighbourhood_group,
# ³neighbourhood, ⁴latitude, ⁵longitude, ⁶room_type
Briefly describe the data
The listing activities of Airbnb homes in New York City’s five boroughs in 2019 are described in this data collection. Geographical coordinates, rental type (whole home/apartment, private room, or shared room), pricing breakdowns, reviews (most recent review, total number of reviews, and reviews per month), and the number of days available in 2019 are all included in the property information. There are 48,895 observations in total (each one representing a listing). I won’t need to change any variables in this situation because the data is already organized.
Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
I don’t believe there are any variables we’ll want to add or update because the data already appears to be very organized. We could want to investigate some variables with outliers that we need to take out. Later on in the analysis, we’ll discuss this again.
Univariate Visualizations
ggplot(Airbnb_NYC, aes(neighbourhood_group)) + geom_bar(fill = "blue") + labs(title = "Number of Airbnb Units in each NYC Borough", x = "BOROUGH", y = "NUMBER OF UNITS") +
theme_bw()
Error in ggplot(Airbnb_NYC, aes(neighbourhood_group)): object 'Airbnb_NYC' not found
ggplot(Airbnb_NYC, aes(x = room_type)) + geom_bar(fill = "yellow") + labs(title = "Number of Airbnb Units by Room Type", x = "ROOM TYPE", y = "NUMBER OF UNITS") +
theme_bw()
Error in ggplot(Airbnb_NYC, aes(x = room_type)): object 'Airbnb_NYC' not found
Bivariate Visualization(s)
ggplot(Airbnb_NYC, aes(neighbourhood_group, fill = room_type)) +
geom_bar(stat = "count") +
labs(title = "Number of Airbnb Units in each NYC Borough", x = "Borough", y = "Number of Units") +
scale_fill_discrete(name = "Room Type") +
theme_bw()
Error in ggplot(Airbnb_NYC, aes(neighbourhood_group, fill = room_type)): object 'Airbnb_NYC' not found
<- NYCHousing %>%
listingperprice_NYCHousing filter(price>0 & price<2500)
%>%
listingperprice_NYCHousing ggplot(aes(neighbourhood_group,price))+
geom_boxplot()
%>%
Airbnb_NYC summarize("mean" = mean(price, na.rm = TRUE),
"standard_deviation" = sd(price, na.rm = TRUE),
"lowest" = min(price, na.rm = TRUE),
"25th quantile" = quantile(price, probs = .25),
"median" = median(price, na.rm = TRUE),
"75th quantile" = quantile(price, probs = .75),
"99 quantile" = quantile(price, .99),
"highest" = max(price, na.rm = TRUE))
Error in summarize(., mean = mean(price, na.rm = TRUE), standard_deviation = sd(price, : object 'Airbnb_NYC' not found
%>%
Airbnb_NYC filter(room_type == "Entire home/apt") %>%
group_by(neighbourhood_group) %>%
summarise(mean = mean(price)) %>%
ggplot(aes(neighbourhood_group,mean,)) +
geom_col(fill = "purple") +
labs(title = "Average Price of Airbnb Units in NYC Borough", x = "Borough", y = "Average Price") +
theme_bw()
Error in filter(., room_type == "Entire home/apt"): object 'Airbnb_NYC' not found
<- filter(Airbnb_NYC, price < 1000) Airbnb_NYC
Error in filter(Airbnb_NYC, price < 1000): object 'Airbnb_NYC' not found
ggplot(Airbnb_NYC, aes(x = price)) + geom_histogram(fill = "pink") +
labs(title = "Price of Airbnb Units in NYC", x = "Price of Unit", y = "Number") +
theme_bw()
Error in ggplot(Airbnb_NYC, aes(x = price)): object 'Airbnb_NYC' not found
%>% filter(room_type == "Entire home/apt") %>%
Airbnb_NYC_rm ggplot(aes(x = price)) + geom_histogram(fill = "green") +
labs(title = "Price of Entire Apartment type Airbnb Units in NYC", x = "Price of Unit", y = "Count") +
theme_bw()
Error in filter(., room_type == "Entire home/apt"): object 'Airbnb_NYC_rm' not found
%>%
Airbnb_NYC_rm filter(neighbourhood_group == "Brooklyn") %>%
ggplot(aes(x = price, y = number_of_reviews, color = room_type)) +
geom_point() +
labs(title = "Price of Airbnb Units in Brooklyn",
x = "Price of Unit", y = "Number of Reviews") +
scale_colour_discrete("Room Type") +
theme_bw()
Error in filter(., neighbourhood_group == "Brooklyn"): object 'Airbnb_NYC_rm' not found