DACSS 601 Data Science Fundamentals - Homework 2
#For this assignment, I’m exploring the Airbnb data in NY dataset, specifically looking at Airbnb rates in Manhattan and Brooklyn.
##1) Read in a dataset & view it.
bookings<-read.csv2(file = "AB_NYC_2019.csv", sep = ",")
dim(bookings)
[1] 48895 16
head(bookings)
id name host_id
1 2539 Clean & quiet apt home by the park 2787
2 2595 Skylit Midtown Castle 2845
3 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632
4 3831 Cozy Entire Floor of Brownstone 4869
5 5022 Entire Apt: Spacious Studio/Loft by central park 7192
6 5099 Large Cozy 1 BR Apartment In Midtown East 7322
host_name neighbourhood_group neighbourhood latitude longitude
1 John Brooklyn Kensington 40.64749 -73.97237
2 Jennifer Manhattan Midtown 40.75362 -73.98377
3 Elisabeth Manhattan Harlem 40.80902 -73.9419
4 LisaRoxanne Brooklyn Clinton Hill 40.68514 -73.95976
5 Laura Manhattan East Harlem 40.79851 -73.94399
6 Chris Manhattan Murray Hill 40.74767 -73.975
room_type price minimum_nights number_of_reviews last_review
1 Private room 149 1 9 2018-10-19
2 Entire home/apt 225 1 45 2019-05-21
3 Private room 150 3 0
4 Entire home/apt 89 1 270 2019-07-05
5 Entire home/apt 80 10 9 2018-11-19
6 Entire home/apt 200 3 74 2019-06-22
reviews_per_month calculated_host_listings_count availability_365
1 0.21 6 365
2 0.38 2 355
3 1 365
4 4.64 1 194
5 0.10 1 0
6 0.59 1 129
##2) Explain variables in dataset.
lapply(bookings,class)
$id
[1] "integer"
$name
[1] "character"
$host_id
[1] "integer"
$host_name
[1] "character"
$neighbourhood_group
[1] "character"
$neighbourhood
[1] "character"
$latitude
[1] "character"
$longitude
[1] "character"
$room_type
[1] "character"
$price
[1] "integer"
$minimum_nights
[1] "integer"
$number_of_reviews
[1] "integer"
$last_review
[1] "character"
$reviews_per_month
[1] "character"
$calculated_host_listings_count
[1] "integer"
$availability_365
[1] "integer"
summary(bookings)
id name host_id
Min. : 2539 Length:48895 Min. : 2438
1st Qu.: 9471945 Class :character 1st Qu.: 7822033
Median :19677284 Mode :character Median : 30793816
Mean :19017143 Mean : 67620011
3rd Qu.:29152178 3rd Qu.:107434423
Max. :36487245 Max. :274321313
host_name neighbourhood_group neighbourhood
Length:48895 Length:48895 Length:48895
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
latitude longitude room_type
Length:48895 Length:48895 Length:48895
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
price minimum_nights number_of_reviews
Min. : 0.0 Min. : 1.00 Min. : 0.00
1st Qu.: 69.0 1st Qu.: 1.00 1st Qu.: 1.00
Median : 106.0 Median : 3.00 Median : 5.00
Mean : 152.7 Mean : 7.03 Mean : 23.27
3rd Qu.: 175.0 3rd Qu.: 5.00 3rd Qu.: 24.00
Max. :10000.0 Max. :1250.00 Max. :629.00
last_review reviews_per_month calculated_host_listings_count
Length:48895 Length:48895 Min. : 1.000
Class :character Class :character 1st Qu.: 1.000
Mode :character Mode :character Median : 1.000
Mean : 7.144
3rd Qu.: 2.000
Max. :327.000
availability_365
Min. : 0.0
1st Qu.: 0.0
Median : 45.0
Mean :112.8
3rd Qu.:227.0
Max. :365.0
This dataset described the data for Airbnb prices in different boroughts of NYC. Along with necessary
descriptive variables such as name, host id, host name, neighborhood_group, neighborhood, latitude,
longitude, room_type, minimum_nights, reviews-related variables, etc.
there are also variables that may affect the rating of the listings.
Variables are either character variables - such as name, host_name, neighborhood, etc., or
integer variables - such as price, reviews for, etc. There are 48895 entries and 16 columns.
id name
0 0
host_id host_name
0 0
neighbourhood_group neighbourhood
0 0
latitude longitude
0 0
room_type price
0 0
minimum_nights number_of_reviews
0 0
last_review reviews_per_month
0 0
calculated_host_listings_count availability_365
0 0
##From this, we can see that there no NAs.
##3)
##Select columns
data_bookings <- dplyr::select(bookings, name, neighbourhood_group, neighbourhood, room_type, price, number_of_reviews)
head(data_bookings)
name
1 Clean & quiet apt home by the park
2 Skylit Midtown Castle
3 THE VILLAGE OF HARLEM....NEW YORK !
4 Cozy Entire Floor of Brownstone
5 Entire Apt: Spacious Studio/Loft by central park
6 Large Cozy 1 BR Apartment In Midtown East
neighbourhood_group neighbourhood room_type price
1 Brooklyn Kensington Private room 149
2 Manhattan Midtown Entire home/apt 225
3 Manhattan Harlem Private room 150
4 Brooklyn Clinton Hill Entire home/apt 89
5 Manhattan East Harlem Entire home/apt 80
6 Manhattan Murray Hill Entire home/apt 200
number_of_reviews
1 9
2 45
3 0
4 270
5 9
6 74
##Filter data based on Manhattan and Brooklyn & arrange by highest price
bookings_brooklyn<-data_bookings %>%
dplyr::filter(neighbourhood_group == "Brooklyn") %>%
arrange(desc(price))
rmarkdown::paged_table(head(bookings_brooklyn))
bookings_manhattan<-data_bookings %>%
dplyr::filter(neighbourhood_group == "Manhattan") %>%
arrange(desc(price))
rmarkdown::paged_table(head(bookings_manhattan))
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hungund (2022, Feb. 13). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw2/
BibTeX citation
@misc{hungund2022hw2, author = {Hungund, Apoorva}, title = {Data Analytics and Computational Social Science: HW2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomahungundaphhw2/}, year = {2022} }