In this dataset there are 32 variables(columns) and 119,390 observations(rows). By looking at the variable names, we can say that the it consists of reservation data of some hotels. 14 of the variables compose of character values, 18 of the variables store numerical values. That being said, some of the character variables and some of the numeric variables are actually indicate dates.
All of the variable names are descriptive enough except adr. “adr” should mean “average daily rate”. Therefore, I changed original column name adr with average_daily_rate.
There are two types of hotels in the dataset: Resort Hotel and City Hotel. Reservations are made in 2015, 2016 and 2017. The data covers reservations from 178 countries. So the data should belong to a big worldwide hotels chain. Both canceled and completed reservations are stored in the dataset as well as no-shows. Thus, each observation includes information about a reservation such as what type of hotel and in which country the reservation is for, number of visitors, dates, daily rates, stay durations and some categorical info about customer and the channel of reservation.
In the dataset, each column repsresent a variable and each row is an observation. Thus, no pivoting is needed.
Code
hotel.bookings <-mutate( hotel.bookings, number_of_guests = adults + children + babies,total_stay = stays_in_weekend_nights + stays_in_week_nights) # adding two new variablesprint(dfSummary(hotel.bookings, varnumbers=FALSE, plain.ascii=FALSE, style="grid", graph.magnif=0.80, valid.col=TRUE),method='render', table.classes='table-condensed')
Data Frame Summary
hotel.bookings
Dimensions: 119390 x 34
Duplicates: 31994
Variable
Stats / Values
Freqs (% of Valid)
Graph
Valid
Missing
hotel
[character]
1. City Hotel
2. Resort Hotel
79330
(
66.4%
)
40060
(
33.6%
)
119390
(100.0%)
0
(0.0%)
is_canceled
[integer]
Min : 0
Mean : 0.4
Max : 1
0
:
75166
(
63.0%
)
1
:
44224
(
37.0%
)
119390
(100.0%)
0
(0.0%)
lead_time
[integer]
Mean (sd) : 104 (106.9)
min ≤ med ≤ max:
0 ≤ 69 ≤ 737
IQR (CV) : 142 (1)
479 distinct values
119390
(100.0%)
0
(0.0%)
arrival_date_year
[integer]
Mean (sd) : 2016.2 (0.7)
min ≤ med ≤ max:
2015 ≤ 2016 ≤ 2017
IQR (CV) : 1 (0)
2015
:
21996
(
18.4%
)
2016
:
56707
(
47.5%
)
2017
:
40687
(
34.1%
)
119390
(100.0%)
0
(0.0%)
arrival_date_month
[character]
1. August
2. July
3. May
4. October
5. April
6. June
7. September
8. March
9. February
10. November
[ 2 others ]
13877
(
11.6%
)
12661
(
10.6%
)
11791
(
9.9%
)
11160
(
9.3%
)
11089
(
9.3%
)
10939
(
9.2%
)
10508
(
8.8%
)
9794
(
8.2%
)
8068
(
6.8%
)
6794
(
5.7%
)
12709
(
10.6%
)
119390
(100.0%)
0
(0.0%)
arrival_date_week_number
[integer]
Mean (sd) : 27.2 (13.6)
min ≤ med ≤ max:
1 ≤ 28 ≤ 53
IQR (CV) : 22 (0.5)
53 distinct values
119390
(100.0%)
0
(0.0%)
arrival_date_day_of_month
[integer]
Mean (sd) : 15.8 (8.8)
min ≤ med ≤ max:
1 ≤ 16 ≤ 31
IQR (CV) : 15 (0.6)
31 distinct values
119390
(100.0%)
0
(0.0%)
stays_in_weekend_nights
[integer]
Mean (sd) : 0.9 (1)
min ≤ med ≤ max:
0 ≤ 1 ≤ 19
IQR (CV) : 2 (1.1)
17 distinct values
119390
(100.0%)
0
(0.0%)
stays_in_week_nights
[integer]
Mean (sd) : 2.5 (1.9)
min ≤ med ≤ max:
0 ≤ 2 ≤ 50
IQR (CV) : 2 (0.8)
35 distinct values
119390
(100.0%)
0
(0.0%)
adults
[integer]
Mean (sd) : 1.9 (0.6)
min ≤ med ≤ max:
0 ≤ 2 ≤ 55
IQR (CV) : 0 (0.3)
14 distinct values
119390
(100.0%)
0
(0.0%)
children
[integer]
Mean (sd) : 0.1 (0.4)
min ≤ med ≤ max:
0 ≤ 0 ≤ 10
IQR (CV) : 0 (3.8)
0
:
110796
(
92.8%
)
1
:
4861
(
4.1%
)
2
:
3652
(
3.1%
)
3
:
76
(
0.1%
)
10
:
1
(
0.0%
)
119386
(100.0%)
4
(0.0%)
babies
[integer]
Mean (sd) : 0 (0.1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 10
IQR (CV) : 0 (12.3)
0
:
118473
(
99.2%
)
1
:
900
(
0.8%
)
2
:
15
(
0.0%
)
9
:
1
(
0.0%
)
10
:
1
(
0.0%
)
119390
(100.0%)
0
(0.0%)
meal
[character]
1. BB
2. FB
3. HB
4. SC
5. Undefined
92310
(
77.3%
)
798
(
0.7%
)
14463
(
12.1%
)
10650
(
8.9%
)
1169
(
1.0%
)
119390
(100.0%)
0
(0.0%)
country
[character]
1. PRT
2. GBR
3. FRA
4. ESP
5. DEU
6. ITA
7. IRL
8. BEL
9. BRA
10. NLD
[ 168 others ]
48590
(
40.7%
)
12129
(
10.2%
)
10415
(
8.7%
)
8568
(
7.2%
)
7287
(
6.1%
)
3766
(
3.2%
)
3375
(
2.8%
)
2342
(
2.0%
)
2224
(
1.9%
)
2104
(
1.8%
)
18590
(
15.6%
)
119390
(100.0%)
0
(0.0%)
market_segment
[character]
1. Aviation
2. Complementary
3. Corporate
4. Direct
5. Groups
6. Offline TA/TO
7. Online TA
8. Undefined
237
(
0.2%
)
743
(
0.6%
)
5295
(
4.4%
)
12606
(
10.6%
)
19811
(
16.6%
)
24219
(
20.3%
)
56477
(
47.3%
)
2
(
0.0%
)
119390
(100.0%)
0
(0.0%)
distribution_channel
[character]
1. Corporate
2. Direct
3. GDS
4. TA/TO
5. Undefined
6677
(
5.6%
)
14645
(
12.3%
)
193
(
0.2%
)
97870
(
82.0%
)
5
(
0.0%
)
119390
(100.0%)
0
(0.0%)
is_repeated_guest
[integer]
Min : 0
Mean : 0
Max : 1
0
:
115580
(
96.8%
)
1
:
3810
(
3.2%
)
119390
(100.0%)
0
(0.0%)
previous_cancellations
[integer]
Mean (sd) : 0.1 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 26
IQR (CV) : 0 (9.7)
15 distinct values
119390
(100.0%)
0
(0.0%)
previous_bookings_not_canceled
[integer]
Mean (sd) : 0.1 (1.5)
min ≤ med ≤ max:
0 ≤ 0 ≤ 72
IQR (CV) : 0 (10.9)
73 distinct values
119390
(100.0%)
0
(0.0%)
reserved_room_type
[character]
1. A
2. B
3. C
4. D
5. E
6. F
7. G
8. H
9. L
10. P
85994
(
72.0%
)
1118
(
0.9%
)
932
(
0.8%
)
19201
(
16.1%
)
6535
(
5.5%
)
2897
(
2.4%
)
2094
(
1.8%
)
601
(
0.5%
)
6
(
0.0%
)
12
(
0.0%
)
119390
(100.0%)
0
(0.0%)
assigned_room_type
[character]
1. A
2. D
3. E
4. F
5. G
6. C
7. B
8. H
9. I
10. K
[ 2 others ]
74053
(
62.0%
)
25322
(
21.2%
)
7806
(
6.5%
)
3751
(
3.1%
)
2553
(
2.1%
)
2375
(
2.0%
)
2163
(
1.8%
)
712
(
0.6%
)
363
(
0.3%
)
279
(
0.2%
)
13
(
0.0%
)
119390
(100.0%)
0
(0.0%)
booking_changes
[integer]
Mean (sd) : 0.2 (0.7)
min ≤ med ≤ max:
0 ≤ 0 ≤ 21
IQR (CV) : 0 (2.9)
21 distinct values
119390
(100.0%)
0
(0.0%)
deposit_type
[character]
1. No Deposit
2. Non Refund
3. Refundable
104641
(
87.6%
)
14587
(
12.2%
)
162
(
0.1%
)
119390
(100.0%)
0
(0.0%)
agent
[character]
1. 9
2. NULL
3. 240
4. 1
5. 14
6. 7
7. 6
8. 250
9. 241
10. 28
[ 324 others ]
31961
(
26.8%
)
16340
(
13.7%
)
13922
(
11.7%
)
7191
(
6.0%
)
3640
(
3.0%
)
3539
(
3.0%
)
3290
(
2.8%
)
2870
(
2.4%
)
1721
(
1.4%
)
1666
(
1.4%
)
33250
(
27.8%
)
119390
(100.0%)
0
(0.0%)
company
[character]
1. NULL
2. 40
3. 223
4. 67
5. 45
6. 153
7. 174
8. 219
9. 281
10. 154
[ 343 others ]
112593
(
94.3%
)
927
(
0.8%
)
784
(
0.7%
)
267
(
0.2%
)
250
(
0.2%
)
215
(
0.2%
)
149
(
0.1%
)
141
(
0.1%
)
138
(
0.1%
)
133
(
0.1%
)
3793
(
3.2%
)
119390
(100.0%)
0
(0.0%)
days_in_waiting_list
[integer]
Mean (sd) : 2.3 (17.6)
min ≤ med ≤ max:
0 ≤ 0 ≤ 391
IQR (CV) : 0 (7.6)
128 distinct values
119390
(100.0%)
0
(0.0%)
customer_type
[character]
1. Contract
2. Group
3. Transient
4. Transient-Party
4076
(
3.4%
)
577
(
0.5%
)
89613
(
75.1%
)
25124
(
21.0%
)
119390
(100.0%)
0
(0.0%)
average_daily_rate
[numeric]
Mean (sd) : 101.8 (50.5)
min ≤ med ≤ max:
-6.4 ≤ 94.6 ≤ 5400
IQR (CV) : 56.7 (0.5)
8879 distinct values
119390
(100.0%)
0
(0.0%)
required_car_parking_spaces
[integer]
Mean (sd) : 0.1 (0.2)
min ≤ med ≤ max:
0 ≤ 0 ≤ 8
IQR (CV) : 0 (3.9)
0
:
111974
(
93.8%
)
1
:
7383
(
6.2%
)
2
:
28
(
0.0%
)
3
:
3
(
0.0%
)
8
:
2
(
0.0%
)
119390
(100.0%)
0
(0.0%)
total_of_special_requests
[integer]
Mean (sd) : 0.6 (0.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 5
IQR (CV) : 1 (1.4)
0
:
70318
(
58.9%
)
1
:
33226
(
27.8%
)
2
:
12969
(
10.9%
)
3
:
2497
(
2.1%
)
4
:
340
(
0.3%
)
5
:
40
(
0.0%
)
119390
(100.0%)
0
(0.0%)
reservation_status
[character]
1. Canceled
2. Check-Out
3. No-Show
43017
(
36.0%
)
75166
(
63.0%
)
1207
(
1.0%
)
119390
(100.0%)
0
(0.0%)
reservation_status_date
[character]
1. 2015-10-21
2. 2015-07-06
3. 2016-11-25
4. 2015-01-01
5. 2016-01-18
6. 2015-07-02
7. 2016-12-07
8. 2015-12-18
9. 2016-02-09
10. 2016-04-04
[ 916 others ]
1461
(
1.2%
)
805
(
0.7%
)
790
(
0.7%
)
763
(
0.6%
)
625
(
0.5%
)
469
(
0.4%
)
450
(
0.4%
)
423
(
0.4%
)
412
(
0.3%
)
382
(
0.3%
)
112810
(
94.5%
)
119390
(100.0%)
0
(0.0%)
number_of_guests
[integer]
Mean (sd) : 2 (0.7)
min ≤ med ≤ max:
0 ≤ 2 ≤ 55
IQR (CV) : 0 (0.4)
15 distinct values
119386
(100.0%)
4
(0.0%)
total_stay
[integer]
Mean (sd) : 3.4 (2.6)
min ≤ med ≤ max:
0 ≤ 3 ≤ 69
IQR (CV) : 2 (0.7)
45 distinct values
119390
(100.0%)
0
(0.0%)
Generated by summarytools 1.0.1 (R version 4.2.1) 2022-12-20
In this table, we can see some descriptive statistics for numeric variables in the data. For example, 37% of the reservations are actually canceled. On average, reservations are made 104 days before date of stay. On average, every reservation is made for 1.97 people. Approximately one out of 10 reservations involves a child or a baby. Average stay duration is 3.43 days. 22% of the booking is changed afterwards. Lastly, average daily rate of hotels is $101.
In terms of numerical variables, only 4 value is missing. However, when we examine the summary table above and the dataset itself we can see that there are some “NULL” entries that shows up as string. From summary table, agent and company variables have “NULL” values. As a sanity check, I will check each column separately if they have “NULL” value or not.
0.41% of country data, 13.89% of agent data and 94.31% of company data of reservations are missing.
Some analysis
According to the summary table, daily rate of a city hotel may go up to 5400 dollars while it is only 508 dollars for resort hotels. I suspect it is the true case.
hotel arrival_date_year country agent number_of_guests total_stay
1 Resort Hotel 2017 GBR 273 2 10
2 Resort Hotel 2015 PRT NULL 2 0
3 Resort Hotel 2015 PRT NULL 2 0
4 Resort Hotel 2015 PRT NULL 4 1
5 Resort Hotel 2015 PRT 240 2 0
6 Resort Hotel 2015 PRT 250 1 0
7 Resort Hotel 2015 PRT NULL 2 0
8 Resort Hotel 2015 PRT 240 2 0
9 Resort Hotel 2015 PRT 305 2 2
10 Resort Hotel 2015 PRT 305 1 2
reservation_status average_daily_rate
1 Check-Out -6.38
2 Check-Out 0.00
3 Check-Out 0.00
4 Check-Out 0.00
5 Check-Out 0.00
6 Check-Out 0.00
7 Check-Out 0.00
8 Check-Out 0.00
9 Canceled 0.00
10 Check-Out 0.00
So, it looks that the row with 5,400 dollars average daily rate is a wrong entry. Also there is a row with negative average daily rate. I will remove both of them.
hotel.bookings %>%select(hotel, average_daily_rate) %>%group_by(hotel) %>%summarise_if(is.numeric, list(min = min, max = max, mean = mean, std_dev = sd, median = median), na.rm =TRUE)
# A tibble: 2 × 6
hotel min max mean std_dev median
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 City Hotel 0 510 105. 39.3 99.9
2 Resort Hotel 0 508 95.0 61.4 75
# A tibble: 10 × 2
country average_daily_rate
<chr> <dbl>
1 DJI 273
2 AIA 265
3 AND 203.
4 UMI 200
5 LAO 182.
6 MYT 178.
7 NCL 176.
8 GEO 169.
9 COM 165.
10 FRO 155.
Code
table(hotel.bookings$arrival_date_month)
April August December February January July June March
11089 13877 6780 8068 5929 12661 10939 9792
May November October September
11791 6794 11160 10508
Average daily rate for City Hotels is 11 dollars higher than Resort Hotels. On the other hand, variation in price for Resort Hotels is bigger than the price variation of City Hotels.
The most popular 10 countries in terms of total reservations are Portugal, Great Britain, France, Spain, Germany, Italy, Ireland, Belgium, Brasil and Netherlands. However, we can see that 56% reservations made for Portugal hotels are actually canceled. This ratio is 35% for Italy and 25% for Spain. Nevertheless, among all of them, the country that hosts the highest number of guests is Portugal with total of 37,670 guests in 3 years.
Interestingly, in terms of average daily rate, the most expensive hotels are in Djibouti, Anguilla, Andorra, United States Minor Outlying Islands, Laos and so on. It looks that hotels in small countries that host a small number of guests are much more expensive.
August, July and May, respectively, are the months when hotels are the busiest throughout the year.
Lastly, let me check how many rows with zero daily rate.
# A tibble: 6 × 2
# Groups: country [6]
country n
<chr> <int>
1 PRT 1550
2 ESP 79
3 GBR 73
4 FRA 55
5 DEU 41
6 NULL 21
Wow! Most of the zero values are from Portugal. So, accuracy of data about hotels in Portugal should need further investigation.
Potential research questions
How do hotel room rates change seasonally?
Are the room rates change according to the length of stay or not?
How is the performance of agents in terms of number of reservations and length of stay?
How the preferences of families with children differ from other visitors?
What are the differences between city hotel reservations and resort hotel reservations
Is the cancellation of a reservation predictable? Which variables are more correlated with cancellation?
Source Code
---title: "Homework-2"author: "Said Arslan"date: "10/20/2022"format: html: toc: true code-fold: true code-copy: true code-tools: truecategories: - hw2 - hotel_bookings---```{r setup}library(tidyverse)library(psych)library(summarytools)knitr::opts_chunk$set(echo =TRUE, warning=FALSE, message=FALSE)```## Read Data```{r read_data}hotel.bookings <-read.csv("_data/hotel_bookings.csv")```## Describe Data```{r data_description_1}dim(hotel.bookings)colnames(hotel.bookings)colnames(hotel.bookings)[28] <-"average_daily_rate"head(hotel.bookings)tail(hotel.bookings)str(hotel.bookings)table(sapply(hotel.bookings, function(x) typeof(x)))sapply( hotel.bookings, function(x) n_distinct(x) )unique(hotel.bookings$hotel)unique(hotel.bookings$arrival_date_year)unique(hotel.bookings$reservation_status)unique(hotel.bookings$distribution_channel)unique(hotel.bookings$customer_type)```In this dataset there are 32 variables(columns) and 119,390 observations(rows). By looking at the variable names, we can say that the it consists of reservation data of some hotels. 14 of the variables compose of character values, 18 of the variables store numerical values. That being said, some of the character variables and some of the numeric variables are actually indicate dates. All of the variable names are descriptive enough except `adr`. "adr" should mean "average daily rate". Therefore, I changed original column name `adr` with `average_daily_rate`.There are two types of hotels in the dataset: Resort Hotel and City Hotel. Reservations are made in 2015, 2016 and 2017. The data covers reservations from 178 countries. So the data should belong to a big worldwide hotels chain. Both canceled and completed reservations are stored in the dataset as well as no-shows. Thus, each observation includes information about a reservation such as what type of hotel and in which country the reservation is for, number of visitors, dates, daily rates, stay durations and some categorical info about customer and the channel of reservation.In the dataset, each column repsresent a variable and each row is an observation. Thus, no pivoting is needed.```{r data_description_2}hotel.bookings <-mutate( hotel.bookings, number_of_guests = adults + children + babies,total_stay = stays_in_weekend_nights + stays_in_week_nights) # adding two new variablesprint(dfSummary(hotel.bookings, varnumbers=FALSE, plain.ascii=FALSE, style="grid", graph.magnif=0.80, valid.col=TRUE),method='render', table.classes='table-condensed')```In this table, we can see some descriptive statistics for numeric variables in the data. For example, 37% of the reservations are actually canceled. On average, reservations are made 104 days before date of stay. On average, every reservation is made for 1.97 people. Approximately one out of 10 reservations involves a child or a baby. Average stay duration is 3.43 days. 22% of the booking is changed afterwards. Lastly, average daily rate of hotels is $101.```{r missing values}colSums(is.na(hotel.bookings))```In terms of numerical variables, only 4 value is missing. However, when we examine the summary table above and the dataset itself we can see that there are some "NULL" entries that shows up as string. From summary table, `agent` and `company` variables have "NULL" values. As a sanity check, I will check each column separately if they have "NULL" value or not.```{r}nulls <-sapply(hotel.bookings, function(x) table(grepl("NULL", x)))for (i in1:32) {if (!is.na(nulls[[i]][2])) {print(nulls[i]) }}```So, actully 3 variables, `country`, `agent` and `company` have "NULL" values. ```{r}round(100*prop.table(table(grepl("NULL", hotel.bookings$country))), 2)round(100*prop.table(table(grepl("NULL", hotel.bookings$agent))), 2)round(100*prop.table(table(grepl("NULL", hotel.bookings$company))), 2)```0.41% of `country` data, 13.89% of `agent` data and 94.31% of `company` data of reservations are missing.## Some analysisAccording to the summary table, daily rate of a city hotel may go up to 5400 dollars while it is only 508 dollars for resort hotels. I suspect it is the true case.```{r}hotel.bookings %>%arrange(desc(average_daily_rate)) %>%slice_head(n=10) %>%select(hotel, arrival_date_year, country, agent, number_of_guests, total_stay, reservation_status, average_daily_rate)hotel.bookings %>%arrange(average_daily_rate) %>%slice_head(n=10) %>%select(hotel, arrival_date_year, country, agent, number_of_guests, total_stay, reservation_status, average_daily_rate)```So, it looks that the row with 5,400 dollars average daily rate is a wrong entry. Also there is a row with negative average daily rate. I will remove both of them. ```{r}hotel.bookings <- hotel.bookings %>%filter(average_daily_rate>=0& average_daily_rate<=510)``````{r}hotel.bookings %>%select(hotel, average_daily_rate) %>%group_by(hotel) %>%summarise_if(is.numeric, list(min = min, max = max, mean = mean, std_dev = sd, median = median), na.rm =TRUE)hotel.bookings %>%select(country) %>%group_by(country) %>%count() %>%arrange(desc(n)) %>%head(n=10)hotel.bookings %>%select(country, is_canceled) %>%group_by(country) %>%summarise_if(is.numeric, sum, na.rm =TRUE) %>%arrange(desc(is_canceled)) %>%head(n=10)hotel.bookings %>%filter(country %in%c("PRT", "GBR", "ESP", "FRA", "ITA")) %>%select(country,is_canceled) %>%group_by(country) %>%summarise_if(is.numeric, mean, na.rm =TRUE) %>%arrange(desc(is_canceled))hotel.bookings %>%filter(reservation_status =="Check-Out") %>%select(country, number_of_guests) %>%group_by(country) %>%summarise_if(is.numeric, sum, na.rm =TRUE) %>%arrange(desc(number_of_guests)) %>%head(n=10)hotel.bookings %>%filter(reservation_status =="Check-Out") %>%select(country, number_of_guests) %>%group_by(country) %>%summarise_if(is.numeric, sum, na.rm =TRUE) %>%arrange(desc(number_of_guests)) %>%head(n=10)hotel.bookings %>%select(country, average_daily_rate) %>%group_by(country) %>%summarise_if(is.numeric, mean, na.rm =TRUE) %>%arrange(desc(average_daily_rate)) %>%head(n=10)table(hotel.bookings$arrival_date_month)```Average daily rate for City Hotels is 11 dollars higher than Resort Hotels. On the other hand, variation in price for Resort Hotels is bigger than the price variation of City Hotels.The most popular 10 countries in terms of total reservations are Portugal, Great Britain, France, Spain, Germany, Italy, Ireland, Belgium, Brasil and Netherlands. However, we can see that 56% reservations made for Portugal hotels are actually canceled. This ratio is 35% for Italy and 25% for Spain. Nevertheless, among all of them, the country that hosts the highest number of guests is Portugal with total of 37,670 guests in 3 years.Interestingly, in terms of average daily rate, the most expensive hotels are in Djibouti, Anguilla, Andorra, United States Minor Outlying Islands, Laos and so on. It looks that hotels in small countries that host a small number of guests are much more expensive.August, July and May, respectively, are the months when hotels are the busiest throughout the year.Lastly, let me check how many rows with zero daily rate.```{r}hotel.bookings %>%filter(average_daily_rate ==0) %>%count()```There are 1959 reservations with zero daily rate.```{r}hotel.bookings %>%filter(average_daily_rate ==0) %>%group_by(country) %>%count() %>%arrange(desc(n)) %>%head()```Wow! Most of the zero values are from Portugal. So, accuracy of data about hotels in Portugal should need further investigation.## Potential research questions1. How do hotel room rates change seasonally?2. Are the room rates change according to the length of stay or not?3. How is the performance of agents in terms of number of reservations and length of stay?4. How the preferences of families with children differ from other visitors?5. What are the differences between city hotel reservations and resort hotel reservations6. Is the cancellation of a reservation predictable? Which variables are more correlated with cancellation?