Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Saksham Kumar
April 12, 2023
Today’s challenge is to:
We read in the hotel_bookings.csv dataset, using the correct R package and command.
The data corresponds to hotel bookings. There are 119390 rows and 32 variables in our data. The 32 data types and their data types are mentioned below.
cols(
hotel = col_character(),
is_canceled = col_double(),
lead_time = col_double(),
arrival_date_year = col_double(),
arrival_date_month = col_character(),
arrival_date_week_number = col_double(),
arrival_date_day_of_month = col_double(),
stays_in_weekend_nights = col_double(),
stays_in_week_nights = col_double(),
adults = col_double(),
children = col_double(),
babies = col_double(),
meal = col_character(),
country = col_character(),
market_segment = col_character(),
distribution_channel = col_character(),
is_repeated_guest = col_double(),
previous_cancellations = col_double(),
previous_bookings_not_canceled = col_double(),
reserved_room_type = col_character(),
assigned_room_type = col_character(),
booking_changes = col_double(),
deposit_type = col_character(),
agent = col_character(),
company = col_character(),
days_in_waiting_list = col_double(),
customer_type = col_character(),
adr = col_double(),
required_car_parking_spaces = col_double(),
total_of_special_requests = col_double(),
reservation_status = col_character(),
reservation_status_date = col_date(format = "")
)
The data looks clean for an initial analysis and does not need to be tidied.
Looking at the data we see that we have 3 variables that can be coalesced into one - arrival_date_day_of_month, arrival_date_month, arrival_date_year.
Using mutate the first three fields can be combined into a single arrival date field.
bookings_mutate_date_arrival<-bookings%>%
mutate(date_arrival = str_c(arrival_date_day_of_month, arrival_date_month, arrival_date_year, sep="/"))
bookings_mutate_strToDate<-bookings_mutate_date_arrival%>%
mutate(date_arrival = dmy(date_arrival))
bookings_mutate_strToDate[ , c("arrival_date_day_of_month", "arrival_date_month", "arrival_date_year", "date_arrival")]
Min. 1st Qu. Median Mean 3rd Qu. Max.
"2015-07-01" "2016-03-13" "2016-09-06" "2016-08-28" "2017-03-18" "2017-08-31"
We can now clean the data by removing the now redundant variables arrival_date_day_of_month
, arrival_date_month
and arrival_date_year
---
title: "Challenge 4: Mutating hotel bookings dataset"
author: "Saksham Kumar"
description: "More data wrangling: mutate"
date: "04/12/2023"
format:
html:
df-print: paged
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- hotel_bookings
- Saksham Kumar
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
We read in the hotel_bookings.csv dataset, using the correct R package and command.
```{r}
bookings<-read_csv("_data/hotel_bookings.csv", show_col_types = FALSE)
bookings
```
### Briefly describe the data
The data corresponds to hotel bookings. There are 119390 rows and 32 variables in our data. The 32 data types and their data types are mentioned below.
```{r}
spec(bookings)
```
## Tidy Data (as needed)
The data looks clean for an initial analysis and does not need to be tidied.
## Identify variables that need to be mutated
Looking at the data we see that we have 3 variables that can be coalesced into one - arrival_date_day_of_month, arrival_date_month, arrival_date_year.
Using mutate the first three fields can be combined into a single arrival date field.
```{r}
bookings_mutate_date_arrival<-bookings%>%
mutate(date_arrival = str_c(arrival_date_day_of_month, arrival_date_month, arrival_date_year, sep="/"))
bookings_mutate_strToDate<-bookings_mutate_date_arrival%>%
mutate(date_arrival = dmy(date_arrival))
bookings_mutate_strToDate[ , c("arrival_date_day_of_month", "arrival_date_month", "arrival_date_year", "date_arrival")]
summary(bookings_mutate_strToDate$date_arrival)
```
We can now clean the data by removing the now redundant variables `arrival_date_day_of_month`, `arrival_date_month` and `arrival_date_year`
```{r}
bookings_final <- bookings_mutate_strToDate%>%
select(-starts_with("arrival"))
bookings_final
```