Challenge 7 - Airbnb Listings

challenge_7
air_bnb
Megan Galarneau
Visualizing Multiple Dimensions
Author

Megan Galarneau

Published

April 17, 2023

Code
library(tidyverse)
library(ggplot2)
library(dplyr)
library(lubridate)
library(patchwork)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. Read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. Tidy data & mutate variables as needed (including sanity checks)
  3. Recreate at least two graphs from previous exercises, but introduce at least one additional dimension that you omitted before using ggplot functionality (color, shape, line, facet, etc) The goal is not to create unneeded chart ink (Tufte), but to concisely capture variation in additional dimensions that were collapsed in your earlier 2 or 3 dimensional graphs.
  4. If you haven’t tried in previous weeks, work this week to make your graphs “publication” ready with titles, captions, and pretty axis labels and other viewer-friendly features

Read in data

Code
#read in the data set, raw
library(readr)
raw_airbnb <- read_csv("_data/AB_NYC_2019.csv")
raw_airbnb

Briefly describe the data

I analyzed this data set in Challenge 5. It describes around 49k Airbnb property listings in NYC boroughs for the year of 2019. Each property listing includes information about geographical location (neighborhood borough/name, latitude/longitude), rental type (entire home/apt, private room, or shared room), price, minimum nights stayed, reviews (last review, total number & per month) and how many days available in 2019. In the next code chunk, I tidied up the data so I can graph price by NYC borough later on.

Code
#summary of data set statistics
print(summarytools::dfSummary(raw_airbnb,
                        varnumbers = FALSE,
                        plain.ascii  = FALSE, 
                        style        = "grid", 
                        graph.magnif = 0.70, 
                        valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

raw_airbnb

Dimensions: 48895 x 16
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
id [numeric]
Mean (sd) : 19017143 (10983108)
min ≤ med ≤ max:
2539 ≤ 19677284 ≤ 36487245
IQR (CV) : 19680234 (0.6)
48895 distinct values 0 (0.0%)
name [character]
1. Hillside Hotel
2. Home away from home
3. New york Multi-unit build
4. Brooklyn Apartment
5. Loft Suite @ The Box Hous
6. Private Room
7. Artsy Private BR in Fort
8. Private room
9. Beautiful Brooklyn Browns
10. Cozy Brooklyn Apartment
[ 47884 others ]
18 ( 0.0% )
17 ( 0.0% )
16 ( 0.0% )
12 ( 0.0% )
11 ( 0.0% )
11 ( 0.0% )
10 ( 0.0% )
10 ( 0.0% )
8 ( 0.0% )
8 ( 0.0% )
48758 ( 99.8% )
16 (0.0%)
host_id [numeric]
Mean (sd) : 67620011 (78610967)
min ≤ med ≤ max:
2438 ≤ 30793816 ≤ 274321313
IQR (CV) : 99612390 (1.2)
37457 distinct values 0 (0.0%)
host_name [character]
1. Michael
2. David
3. Sonder (NYC)
4. John
5. Alex
6. Blueground
7. Sarah
8. Daniel
9. Jessica
10. Maria
[ 11442 others ]
417 ( 0.9% )
403 ( 0.8% )
327 ( 0.7% )
294 ( 0.6% )
279 ( 0.6% )
232 ( 0.5% )
227 ( 0.5% )
226 ( 0.5% )
205 ( 0.4% )
204 ( 0.4% )
46060 ( 94.2% )
21 (0.0%)
neighbourhood_group [character]
1. Bronx
2. Brooklyn
3. Manhattan
4. Queens
5. Staten Island
1091 ( 2.2% )
20104 ( 41.1% )
21661 ( 44.3% )
5666 ( 11.6% )
373 ( 0.8% )
0 (0.0%)
neighbourhood [character]
1. Williamsburg
2. Bedford-Stuyvesant
3. Harlem
4. Bushwick
5. Upper West Side
6. Hell's Kitchen
7. East Village
8. Upper East Side
9. Crown Heights
10. Midtown
[ 211 others ]
3920 ( 8.0% )
3714 ( 7.6% )
2658 ( 5.4% )
2465 ( 5.0% )
1971 ( 4.0% )
1958 ( 4.0% )
1853 ( 3.8% )
1798 ( 3.7% )
1564 ( 3.2% )
1545 ( 3.2% )
25449 ( 52.0% )
0 (0.0%)
latitude [numeric]
Mean (sd) : 40.7 (0.1)
min ≤ med ≤ max:
40.5 ≤ 40.7 ≤ 40.9
IQR (CV) : 0.1 (0)
19048 distinct values 0 (0.0%)
longitude [numeric]
Mean (sd) : -74 (0)
min ≤ med ≤ max:
-74.2 ≤ -74 ≤ -73.7
IQR (CV) : 0 (0)
14718 distinct values 0 (0.0%)
room_type [character]
1. Entire home/apt
2. Private room
3. Shared room
25409 ( 52.0% )
22326 ( 45.7% )
1160 ( 2.4% )
0 (0.0%)
price [numeric]
Mean (sd) : 152.7 (240.2)
min ≤ med ≤ max:
0 ≤ 106 ≤ 10000
IQR (CV) : 106 (1.6)
674 distinct values 0 (0.0%)
minimum_nights [numeric]
Mean (sd) : 7 (20.5)
min ≤ med ≤ max:
1 ≤ 3 ≤ 1250
IQR (CV) : 4 (2.9)
109 distinct values 0 (0.0%)
number_of_reviews [numeric]
Mean (sd) : 23.3 (44.6)
min ≤ med ≤ max:
0 ≤ 5 ≤ 629
IQR (CV) : 23 (1.9)
394 distinct values 0 (0.0%)
last_review [Date]
min : 2011-03-28
med : 2019-05-19
max : 2019-07-08
range : 8y 3m 10d
1764 distinct values 10052 (20.6%)
reviews_per_month [numeric]
Mean (sd) : 1.4 (1.7)
min ≤ med ≤ max:
0 ≤ 0.7 ≤ 58.5
IQR (CV) : 1.8 (1.2)
937 distinct values 10052 (20.6%)
calculated_host_listings_count [numeric]
Mean (sd) : 7.1 (33)
min ≤ med ≤ max:
1 ≤ 1 ≤ 327
IQR (CV) : 1 (4.6)
47 distinct values 0 (0.0%)
availability_365 [numeric]
Mean (sd) : 112.8 (131.6)
min ≤ med ≤ max:
0 ≤ 45 ≤ 365
IQR (CV) : 227 (1.2)
366 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.2.2)
2023-04-17

Code
#created new data table in order to graph borough by price segmented by room type
tidy_Airbnb <- raw_airbnb %>%
  filter(room_type == "Shared room" | room_type == "Entire home/apt" | room_type == "Private room") %>% 
  group_by(neighbourhood_group, room_type) %>%
  summarise(mean_price=mean(price))
tidy_Airbnb

Visualization with Multiple Dimensions

In my previous challenge, I created univariate and bivariate visualizations of this data which analyzed price, neighborhood borough, and room type. Today, I will revisit these graphs and create new visualizations which introduce at least one additional dimension.

See the first graph below. It shows number of property listings by neighborhood borough segmented by room type. Key takeaways:

  • Manhattan and Brooklyn have the most listings while Staten Island and Bronx have the least number of listings

  • The most common room type for the majority of listings is entire home/apartment with shared room being the least common

  • Overall, Manhattan has the most listings and highest ratio of entire home/apartment to other room types

Code
#bar graph to visualize number of listings by borough and room type
cbbPalette <- c("#B74F6F", "#ADBDFF", "#3185FC")
bar_Airbnb <- ggplot(raw_airbnb, aes(neighbourhood_group, fill = room_type, na.rm = TRUE)) + 
  geom_bar(stat = "count", colour="black") + 
  labs(title = "Airbnb Property Listings by NYC Borough & Room Type", x = "NYC Borough", y = "Number of Listings", subtitle = "Data time frame: 2019") +
  scale_fill_discrete(name = "Room Type") + 
  theme_bw() +
  theme(legend.position = "left") +
  scale_fill_manual(values=cbbPalette)
bar_Airbnb

At a high level, it is clear that Manhattan has the most listings, but it is the most expensive? The graph below answers that question, yes! Not only are it’s property prices the most expensive on average, but the entire home/apartment price is the highest priced of all the NYC boroughs. For NYC, this result makes sense since it is a highly sought after neighborhood.

Code
#bar graph to visualize price of listings by borough and room type
price_Airbnb <- tidy_Airbnb %>%
  ggplot(tidy_airbnb, mapping = aes(neighbourhood_group, mean_price, fill = room_type)) +
  geom_bar(position = "dodge", stat = "identity", colour="black") + 
  labs(title = "Airbnb Property Listings by NYC Borough & Price", x = "NYC Borough", y = "Price in U.S. Dollars ($)", subtitle = "Data time frame: 2019") +
  scale_fill_discrete(name = "Room Type") + 
  theme_bw() +
  theme(legend.position = "left") +
  scale_fill_manual(values=cbbPalette)
price_Airbnb