Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Yoshita Varma Annam
January 2, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
As I am already familiar with hotel booking data from challenge 2, I choose this dataset to work on in challenge 4. Because in the challenge 2 I some part of the data wasn’t very clean and can be mutated for better analysis.
Some of my analysis is based on challenge 2. Reflecting on those please refer below.
# A tibble: 119,390 × 32
hotel is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ arriv…⁶ stays…⁷ stays…⁸ adults
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Resor… 0 342 2015 July 27 1 0 0 2
2 Resor… 0 737 2015 July 27 1 0 0 2
3 Resor… 0 7 2015 July 27 1 0 1 1
4 Resor… 0 13 2015 July 27 1 0 1 1
5 Resor… 0 14 2015 July 27 1 0 2 2
6 Resor… 0 14 2015 July 27 1 0 2 2
7 Resor… 0 0 2015 July 27 1 0 2 2
8 Resor… 0 9 2015 July 27 1 0 2 2
9 Resor… 1 85 2015 July 27 1 0 3 2
10 Resor… 1 75 2015 July 27 1 0 3 2
# … with 119,380 more rows, 22 more variables: children <dbl>, babies <dbl>,
# meal <chr>, country <chr>, market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
# deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
# customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …
By just viewing the data it looks like the data is about 119,390 hotel entries and detailing for 32 features. The features mainly describe the booking entirely based on their arrival, cancellations and timings. It also accounts the number of babies, children, adults across the world. There is a separate field to verify for the repeated guests. To understand further we need to perform more operations.
hotel is_canceled lead_time arrival_date_year
Length:119390 Min. :0.0000 Min. : 0 Min. :2015
Class :character 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
Mode :character Median :0.0000 Median : 69 Median :2016
Mean :0.3704 Mean :104 Mean :2016
3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
Max. :1.0000 Max. :737 Max. :2017
arrival_date_month arrival_date_week_number arrival_date_day_of_month
Length:119390 Min. : 1.00 Min. : 1.0
Class :character 1st Qu.:16.00 1st Qu.: 8.0
Mode :character Median :28.00 Median :16.0
Mean :27.17 Mean :15.8
3rd Qu.:38.00 3rd Qu.:23.0
Max. :53.00 Max. :31.0
stays_in_weekend_nights stays_in_week_nights adults
Min. : 0.0000 Min. : 0.0 Min. : 0.000
1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
Median : 1.0000 Median : 2.0 Median : 2.000
Mean : 0.9276 Mean : 2.5 Mean : 1.856
3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
Max. :19.0000 Max. :50.0 Max. :55.000
children babies meal country
Min. : 0.0000 Min. : 0.000000 Length:119390 Length:119390
1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
Median : 0.0000 Median : 0.000000 Mode :character Mode :character
Mean : 0.1039 Mean : 0.007949
3rd Qu.: 0.0000 3rd Qu.: 0.000000
Max. :10.0000 Max. :10.000000
NA's :4
market_segment distribution_channel is_repeated_guest
Length:119390 Length:119390 Min. :0.00000
Class :character Class :character 1st Qu.:0.00000
Mode :character Mode :character Median :0.00000
Mean :0.03191
3rd Qu.:0.00000
Max. :1.00000
previous_cancellations previous_bookings_not_canceled reserved_room_type
Min. : 0.00000 Min. : 0.0000 Length:119390
1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
Median : 0.00000 Median : 0.0000 Mode :character
Mean : 0.08712 Mean : 0.1371
3rd Qu.: 0.00000 3rd Qu.: 0.0000
Max. :26.00000 Max. :72.0000
assigned_room_type booking_changes deposit_type agent
Length:119390 Min. : 0.0000 Length:119390 Length:119390
Class :character 1st Qu.: 0.0000 Class :character Class :character
Mode :character Median : 0.0000 Mode :character Mode :character
Mean : 0.2211
3rd Qu.: 0.0000
Max. :21.0000
company days_in_waiting_list customer_type adr
Length:119390 Min. : 0.000 Length:119390 Min. : -6.38
Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 69.29
Mode :character Median : 0.000 Mode :character Median : 94.58
Mean : 2.321 Mean : 101.83
3rd Qu.: 0.000 3rd Qu.: 126.00
Max. :391.000 Max. :5400.00
required_car_parking_spaces total_of_special_requests reservation_status
Min. :0.00000 Min. :0.0000 Length:119390
1st Qu.:0.00000 1st Qu.:0.0000 Class :character
Median :0.00000 Median :0.0000 Mode :character
Mean :0.06252 Mean :0.5714
3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :8.00000 Max. :5.0000
reservation_status_date
Min. :2014-10-17
1st Qu.:2016-02-01
Median :2016-08-07
Mean :2016-07-30
3rd Qu.:2017-02-08
Max. :2017-09-14
[1] "hotel" "is_canceled"
[3] "lead_time" "arrival_date_year"
[5] "arrival_date_month" "arrival_date_week_number"
[7] "arrival_date_day_of_month" "stays_in_weekend_nights"
[9] "stays_in_week_nights" "adults"
[11] "children" "babies"
[13] "meal" "country"
[15] "market_segment" "distribution_channel"
[17] "is_repeated_guest" "previous_cancellations"
[19] "previous_bookings_not_canceled" "reserved_room_type"
[21] "assigned_room_type" "booking_changes"
[23] "deposit_type" "agent"
[25] "company" "days_in_waiting_list"
[27] "customer_type" "adr"
[29] "required_car_parking_spaces" "total_of_special_requests"
[31] "reservation_status" "reservation_status_date"
[1] "No Deposit" "Refundable" "Non Refund"
[1] 8
[1] "Direct" "Corporate" "Online TA" "Offline TA/TO"
[5] "Complementary" "Groups" "Undefined" "Aviation"
[1] 8
[1] "Direct" "Corporate" "TA/TO" "Undefined" "GDS"
[1] 5
[1] "Resort Hotel" "City Hotel"
[1] 2
[1] "PRT" "GBR" "USA" "ESP" "IRL" "FRA" "NULL" "ROU" "NOR" "OMN"
[11] "ARG" "POL" "DEU" "BEL" "CHE" "CN" "GRC" "ITA" "NLD" "DNK"
[21] "RUS" "SWE" "AUS" "EST" "CZE" "BRA" "FIN" "MOZ" "BWA" "LUX"
[31] "SVN" "ALB" "IND" "CHN" "MEX" "MAR" "UKR" "SMR" "LVA" "PRI"
[41] "SRB" "CHL" "AUT" "BLR" "LTU" "TUR" "ZAF" "AGO" "ISR" "CYM"
[51] "ZMB" "CPV" "ZWE" "DZA" "KOR" "CRI" "HUN" "ARE" "TUN" "JAM"
[61] "HRV" "HKG" "IRN" "GEO" "AND" "GIB" "URY" "JEY" "CAF" "CYP"
[71] "COL" "GGY" "KWT" "NGA" "MDV" "VEN" "SVK" "FJI" "KAZ" "PAK"
[81] "IDN" "LBN" "PHL" "SEN" "SYC" "AZE" "BHR" "NZL" "THA" "DOM"
[91] "MKD" "MYS" "ARM" "JPN" "LKA" "CUB" "CMR" "BIH" "MUS" "COM"
[101] "SUR" "UGA" "BGR" "CIV" "JOR" "SYR" "SGP" "BDI" "SAU" "VNM"
[111] "PLW" "QAT" "EGY" "PER" "MLT" "MWI" "ECU" "MDG" "ISL" "UZB"
[121] "NPL" "BHS" "MAC" "TGO" "TWN" "DJI" "STP" "KNA" "ETH" "IRQ"
[131] "HND" "RWA" "KHM" "MCO" "BGD" "IMN" "TJK" "NIC" "BEN" "VGB"
[141] "TZA" "GAB" "GHA" "TMP" "GLP" "KEN" "LIE" "GNB" "MNE" "UMI"
[151] "MYT" "FRO" "MMR" "PAN" "BFA" "LBY" "MLI" "NAM" "BOL" "PRY"
[161] "BRB" "ABW" "AIA" "SLV" "DMA" "PYF" "GUY" "LCA" "ATA" "GTM"
[171] "ASM" "MRT" "NCL" "KIR" "SDN" "ATF" "SLE" "LAO"
[1] 178
After the following analysis it is clear that the data has been collected across the world for different countries approximately 150-180 from 2015 to 2017. The data is very specific to two kinds of hotels- “Resort Hotel”, “City Hotel”. There are majorly 8 kinds of bookings which include all the professional to personal types like- Corporate, Aviation etc. If we observe the mean from the summaries it can be said that there were approximately 185% adults, children 10%, and 1% baby have come to stay in the hotels. Similarly, on an average people stayed for 2.5 days during the week and 1 day during the weekends. The stats are only based on the summaries. To further colcude more accurately for this data need more analysis.
ABW AGO AIA ALB AND ARE ARG ARM ASM ATA ATF AUS AUT
2 362 1 12 7 51 214 8 1 2 1 426 1263
AZE BDI BEL BEN BFA BGD BGR BHR BHS BIH BLR BOL BRA
17 1 2342 3 1 12 75 5 1 13 26 10 2224
BRB BWA CAF CHE CHL CHN CIV CMR CN COL COM CPV CRI
4 1 5 1730 65 999 6 10 1279 71 2 24 19
CUB CYM CYP CZE DEU DJI DMA DNK DOM DZA ECU EGY ESP
8 1 51 171 7287 1 1 435 14 103 27 32 8568
EST ETH FIN FJI FRA FRO GAB GBR GEO GGY GHA GIB GLP
83 3 447 1 10415 5 4 12129 22 3 4 18 2
GNB GRC GTM GUY HKG HND HRV HUN IDN IMN IND IRL IRN
9 128 4 1 29 1 100 230 35 2 152 3375 83
IRQ ISL ISR ITA JAM JEY JOR JPN KAZ KEN KHM KIR KNA
14 57 669 3766 6 8 21 197 19 6 2 1 2
KOR KWT LAO LBN LBY LCA LIE LKA LTU LUX LVA MAC MAR
133 16 2 31 8 1 3 7 81 287 55 16 259
MCO MDG MDV MEX MKD MLI MLT MMR MNE MOZ MRT MUS MWI
4 1 12 85 10 1 18 1 5 67 1 7 2
MYS MYT NAM NCL NGA NIC NLD NOR NPL NULL NZL OMN PAK
28 2 1 1 34 1 2104 607 1 488 74 18 14
PAN PER PHL PLW POL PRI PRT PRY PYF QAT ROU RUS RWA
9 29 40 1 919 12 48590 4 1 15 500 632 2
SAU SDN SEN SGP SLE SLV SMR SRB STP SUR SVK SVN SWE
48 1 11 39 1 2 1 101 2 5 65 57 1024
SYC SYR TGO THA TJK TMP TUN TUR TWN TZA UGA UKR UMI
2 3 2 59 9 3 39 248 51 5 2 68 1
URY USA UZB VEN VGB VNM ZAF ZMB ZWE
32 2097 4 26 1 8 80 2 4
After removing the rows which has NULL values in country coulmn as this is irrelevant to our analysis.
As arrival year, month, date has been stored in three different columns we can combine the arrival date in year, month and the date into one single field and name it as the arrival date. Also, I feel arrival_date_week_number is irrelevant for my analysis as we can take a good guess of the week number from the new field arrival date.
# A tibble: 118,902 × 31
hotel is_ca…¹ lead_…² arriv…³ arriv…⁴ arriv…⁵ stays…⁶ stays…⁷ adults child…⁸
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Resor… 0 342 2015 July 1 0 0 2 0
2 Resor… 0 737 2015 July 1 0 0 2 0
3 Resor… 0 7 2015 July 1 0 1 1 0
4 Resor… 0 13 2015 July 1 0 1 1 0
5 Resor… 0 14 2015 July 1 0 2 2 0
6 Resor… 0 14 2015 July 1 0 2 2 0
7 Resor… 0 0 2015 July 1 0 2 2 0
8 Resor… 0 9 2015 July 1 0 2 2 0
9 Resor… 1 85 2015 July 1 0 3 2 0
10 Resor… 1 75 2015 July 1 0 3 2 0
# … with 118,892 more rows, 21 more variables: babies <dbl>, meal <chr>,
# country <chr>, market_segment <chr>, distribution_channel <chr>,
# is_repeated_guest <dbl>, previous_cancellations <dbl>,
# previous_bookings_not_canceled <dbl>, reserved_room_type <chr>,
# assigned_room_type <chr>, booking_changes <dbl>, deposit_type <chr>,
# agent <chr>, company <chr>, days_in_waiting_list <dbl>,
# customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>, …
Any additional comments?
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
As the data has been modified as per required I’ll go ahead and mutate it to have single field for arrivale date as mentioned above.
#Mutating the arrival date into a single field
HotelBookings_csv_mutate <- HotelBookings_csv %>%
mutate(arrival_date = str_c(arrival_date_day_of_month,
arrival_date_month,
arrival_date_year, sep="/"),
arrival_date = lubridate::dmy(arrival_date)) %>%
select(-c(arrival_date_day_of_month,arrival_date_month,arrival_date_year))
HotelBookings_csv_mutate
# A tibble: 118,902 × 29
hotel is_ca…¹ lead_…² stays…³ stays…⁴ adults child…⁵ babies meal country
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Resort H… 0 342 0 0 2 0 0 BB PRT
2 Resort H… 0 737 0 0 2 0 0 BB PRT
3 Resort H… 0 7 0 1 1 0 0 BB GBR
4 Resort H… 0 13 0 1 1 0 0 BB GBR
5 Resort H… 0 14 0 2 2 0 0 BB GBR
6 Resort H… 0 14 0 2 2 0 0 BB GBR
7 Resort H… 0 0 0 2 2 0 0 BB PRT
8 Resort H… 0 9 0 2 2 0 0 FB PRT
9 Resort H… 1 85 0 3 2 0 0 BB PRT
10 Resort H… 1 75 0 3 2 0 0 HB PRT
# … with 118,892 more rows, 19 more variables: market_segment <chr>,
# distribution_channel <chr>, is_repeated_guest <dbl>,
# previous_cancellations <dbl>, previous_bookings_not_canceled <dbl>,
# reserved_room_type <chr>, assigned_room_type <chr>, booking_changes <dbl>,
# deposit_type <chr>, agent <chr>, company <chr>, days_in_waiting_list <dbl>,
# customer_type <chr>, adr <dbl>, required_car_parking_spaces <dbl>,
# total_of_special_requests <dbl>, reservation_status <chr>, …
Similarly, changing the datatype of the columns of company and agent as they have numerical values due to some NULL values it has character as datatype. First handling NULL values and changing them NA as the targetted datatype will be numeric.
Now reviewing the summary of the mutated data.
hotel is_canceled lead_time stays_in_weekend_nights
Length:118902 Min. :0.0000 Min. : 0.0 Min. : 0.0000
Class :character 1st Qu.:0.0000 1st Qu.: 18.0 1st Qu.: 0.0000
Mode :character Median :0.0000 Median : 69.0 Median : 1.0000
Mean :0.3714 Mean :104.3 Mean : 0.9289
3rd Qu.:1.0000 3rd Qu.:161.0 3rd Qu.: 2.0000
Max. :1.0000 Max. :737.0 Max. :16.0000
stays_in_week_nights adults children babies
Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.000000
1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.: 0.0000 1st Qu.: 0.000000
Median : 2.000 Median : 2.000 Median : 0.0000 Median : 0.000000
Mean : 2.502 Mean : 1.858 Mean : 0.1042 Mean : 0.007948
3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.: 0.0000 3rd Qu.: 0.000000
Max. :41.000 Max. :55.000 Max. :10.0000 Max. :10.000000
NA's :4
meal country market_segment distribution_channel
Length:118902 Length:118902 Length:118902 Length:118902
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
is_repeated_guest previous_cancellations previous_bookings_not_canceled
Min. :0.00000 Min. : 0.00000 Min. : 0.0000
1st Qu.:0.00000 1st Qu.: 0.00000 1st Qu.: 0.0000
Median :0.00000 Median : 0.00000 Median : 0.0000
Mean :0.03201 Mean : 0.08714 Mean : 0.1316
3rd Qu.:0.00000 3rd Qu.: 0.00000 3rd Qu.: 0.0000
Max. :1.00000 Max. :26.00000 Max. :72.0000
reserved_room_type assigned_room_type booking_changes deposit_type
Length:118902 Length:118902 Min. : 0.0000 Length:118902
Class :character Class :character 1st Qu.: 0.0000 Class :character
Mode :character Mode :character Median : 0.0000 Mode :character
Mean : 0.2212
3rd Qu.: 0.0000
Max. :21.0000
agent company days_in_waiting_list customer_type
Min. : 1.00 Min. : 6.0 Min. : 0.000 Length:118902
1st Qu.: 9.00 1st Qu.: 62.0 1st Qu.: 0.000 Class :character
Median : 14.00 Median :179.0 Median : 0.000 Mode :character
Mean : 86.54 Mean :189.6 Mean : 2.331
3rd Qu.:229.00 3rd Qu.:270.0 3rd Qu.: 0.000
Max. :535.00 Max. :543.0 Max. :391.000
NA's :16006 NA's :112279
adr required_car_parking_spaces total_of_special_requests
Min. : -6.38 Min. :0.00000 Min. :0.0000
1st Qu.: 70.00 1st Qu.:0.00000 1st Qu.:0.0000
Median : 95.00 Median :0.00000 Median :0.0000
Mean : 102.00 Mean :0.06188 Mean :0.5717
3rd Qu.: 126.00 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :5400.00 Max. :8.00000 Max. :5.0000
reservation_status reservation_status_date arrival_date
Length:118902 Min. :2014-10-17 Min. :2015-07-01
Class :character 1st Qu.:2016-02-02 1st Qu.:2016-03-14
Mode :character Median :2016-08-08 Median :2016-09-07
Mean :2016-07-30 Mean :2016-08-29
3rd Qu.:2017-02-09 3rd Qu.:2017-03-19
Max. :2017-09-14 Max. :2017-08-31
As you can see there is only one field for arrival date and dataype for columns agent and company has been changed to Numeric.
---
title: "Challenge 4"
author: "Yoshita Varma Annam"
description: "More data wrangling: mutate"
date: "1/2/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.csv⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions ⭐⭐⭐⭐⭐
```{r}
library(readr)
HotelBookings_csv <- read_csv("_data/hotel_bookings.csv")
```
As I am already familiar with hotel booking data from challenge 2, I choose this dataset to work on in challenge 4. Because in the challenge 2 I some part of the data wasn't very clean and can be mutated for better analysis.
### Briefly describe the data
Some of my analysis is based on challenge 2. Reflecting on those please refer below.
```{r}
#| label: understanding the data
HotelBookings_csv
```
By just viewing the data it looks like the data is about 119,390 hotel entries and detailing for 32 features. The features mainly describe the booking entirely based on their arrival, cancellations and timings. It also accounts the number of babies, children, adults across the world. There is a separate field to verify for the repeated guests. To understand further we need to perform more operations.
```{r}
#| label: summary of the data
summary(HotelBookings_csv)
```
```{r}
#| label: column names the data
colnames(HotelBookings_csv)
```
```{r}
#| label: finding unique values of the data
unique(HotelBookings_csv$deposit_type)
length(unique(HotelBookings_csv$market_segment))
unique(HotelBookings_csv$market_segment)
length(unique(HotelBookings_csv$market_segment))
unique(HotelBookings_csv$distribution_channel)
length(unique(HotelBookings_csv$distribution_channel))
unique(HotelBookings_csv$hotel)
length(unique(HotelBookings_csv$hotel))
unique(HotelBookings_csv$country)
length(unique(HotelBookings_csv$country))
```
After the following analysis it is clear that the data has been collected across the world for different countries approximately 150-180 from 2015 to 2017. The data is very specific to two kinds of hotels- "Resort Hotel", "City Hotel". There are majorly 8 kinds of bookings which include all the professional to personal types like- Corporate, Aviation etc. If we observe the mean from the summaries it can be said that there were approximately 185% adults, children 10%, and 1% baby have come to stay in the hotels. Similarly, on an average people stayed for 2.5 days during the week and 1 day during the weekends. The stats are only based on the summaries. To further colcude more accurately for this data need more analysis.
## Tidy Data (as needed)
```{r}
#Null values in country column
table(HotelBookings_csv$country)
```
After removing the rows which has NULL values in country coulmn as this is irrelevant to our analysis.
```{r}
# Removing Null values
HotelBookings_csv <- HotelBookings_csv %>%
filter(!(country == "NULL"))
```
As arrival year, month, date has been stored in three different columns we can combine the arrival date in year, month and the date into one single field and name it as the arrival date. Also, I feel arrival_date_week_number is irrelevant for my analysis as we can take a good guess of the week number from the new field arrival date.
```{r}
# Remove Columns by Index
HotelBookings_csv <- HotelBookings_csv[,-6]
HotelBookings_csv
```
Any additional comments?
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
As the data has been modified as per required I'll go ahead and mutate it to have single field for arrivale date as mentioned above.
```{r}
#Mutating the arrival date into a single field
HotelBookings_csv_mutate <- HotelBookings_csv %>%
mutate(arrival_date = str_c(arrival_date_day_of_month,
arrival_date_month,
arrival_date_year, sep="/"),
arrival_date = lubridate::dmy(arrival_date)) %>%
select(-c(arrival_date_day_of_month,arrival_date_month,arrival_date_year))
HotelBookings_csv_mutate
```
Similarly, changing the datatype of the columns of company and agent as they have numerical values due to some NULL values it has character as datatype. First handling NULL values and changing them NA as the targetted datatype will be numeric.
```{r}
#Mutating the class of the agent and company field from character to numeric
HotelBookings_csv_mutate <- HotelBookings_csv_mutate %>%
mutate(across(c(agent, company),~ replace(.,str_detect(., "NULL"), NA))) %>% mutate_at(vars(agent, company),as.numeric)
```
Now reviewing the summary of the mutated data.
```{r}
summary(HotelBookings_csv_mutate)
```
As you can see there is only one field for arrival date and dataype for columns agent and company has been changed to Numeric.