Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Jinxia Niu
March 25, 2023
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
[1] 119390 32
hotel is_canceled lead_time arrival_date_year arrival_date_month
1 Resort Hotel 0 342 2015 July
2 Resort Hotel 0 737 2015 July
3 Resort Hotel 0 7 2015 July
4 Resort Hotel 0 13 2015 July
5 Resort Hotel 0 14 2015 July
6 Resort Hotel 0 14 2015 July
7 Resort Hotel 0 0 2015 July
8 Resort Hotel 0 9 2015 July
9 Resort Hotel 1 85 2015 July
10 Resort Hotel 1 75 2015 July
arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights
1 27 1 0
2 27 1 0
3 27 1 0
4 27 1 0
5 27 1 0
6 27 1 0
7 27 1 0
8 27 1 0
9 27 1 0
10 27 1 0
stays_in_week_nights adults children babies meal country market_segment
1 0 2 0 0 BB PRT Direct
2 0 2 0 0 BB PRT Direct
3 1 1 0 0 BB GBR Direct
4 1 1 0 0 BB GBR Corporate
5 2 2 0 0 BB GBR Online TA
6 2 2 0 0 BB GBR Online TA
7 2 2 0 0 BB PRT Direct
8 2 2 0 0 FB PRT Direct
9 3 2 0 0 BB PRT Online TA
10 3 2 0 0 HB PRT Offline TA/TO
distribution_channel is_repeated_guest previous_cancellations
1 Direct 0 0
2 Direct 0 0
3 Direct 0 0
4 Corporate 0 0
5 TA/TO 0 0
6 TA/TO 0 0
7 Direct 0 0
8 Direct 0 0
9 TA/TO 0 0
10 TA/TO 0 0
previous_bookings_not_canceled reserved_room_type assigned_room_type
1 0 C C
2 0 C C
3 0 A C
4 0 A A
5 0 A A
6 0 A A
7 0 C C
8 0 C C
9 0 A A
10 0 D D
booking_changes deposit_type agent company days_in_waiting_list
1 3 No Deposit NULL NULL 0
2 4 No Deposit NULL NULL 0
3 0 No Deposit NULL NULL 0
4 0 No Deposit 304 NULL 0
5 0 No Deposit 240 NULL 0
6 0 No Deposit 240 NULL 0
7 0 No Deposit NULL NULL 0
8 0 No Deposit 303 NULL 0
9 0 No Deposit 240 NULL 0
10 0 No Deposit 15 NULL 0
customer_type adr required_car_parking_spaces total_of_special_requests
1 Transient 0.0 0 0
2 Transient 0.0 0 0
3 Transient 75.0 0 0
4 Transient 75.0 0 0
5 Transient 98.0 0 1
6 Transient 98.0 0 1
7 Transient 107.0 0 0
8 Transient 103.0 0 1
9 Transient 82.0 0 1
10 Transient 105.5 0 0
reservation_status reservation_status_date
1 Check-Out 2015-07-01
2 Check-Out 2015-07-01
3 Check-Out 2015-07-02
4 Check-Out 2015-07-02
5 Check-Out 2015-07-03
6 Check-Out 2015-07-03
7 Check-Out 2015-07-03
8 Check-Out 2015-07-03
9 Canceled 2015-05-06
10 Canceled 2015-04-22
It’s a hotel booking data set of 32 variables,119390 observations.There are several redundant variables, especially the arrival date_year,arrival_date_month and arrival_date_day_of_the_month ones.
We need to use mutate() and make_date() from the lubridate package to create just one date format. In addition, because the arrival-month is currently shown as characters(eg.July, August) instead of numbers, we also need to use case_when() from the dplyr package to convert them first into numbers.
##Note:The case_when() function from the dplyr package in R can be used to create new variables from existing variables.
This function uses the following basic syntax:
library(dplyr)
df %>% mutate(new_var = case_when(var1 < 15 ~ ‘low’, var2 < 25 ~ ‘med’, TRUE ~ ‘high’)) Note that TRUE is equivalent to an “else” statement.
library(dplyr)
library(lubridate)
hotel_bookings <- hotel_bookings %>%
mutate(month = case_when(
arrival_date_month == "January" ~ 1,
arrival_date_month == "February" ~ 2,
arrival_date_month == "March" ~ 3,
arrival_date_month == "April" ~ 4,
arrival_date_month == "May" ~ 5,
arrival_date_month == "June" ~ 6,
arrival_date_month == "July" ~ 7,
arrival_date_month == "August" ~ 8,
arrival_date_month == "September" ~ 9,
arrival_date_month == "October" ~ 10,
arrival_date_month == "November" ~ 11,
arrival_date_month == "December" ~ 12,
)) %>%
mutate(arrival_date = make_date(arrival_date_year, month, arrival_date_day_of_month)) %>%
select(-c(arrival_date_year, arrival_date_month, arrival_date_day_of_month, month, arrival_date_week_number)) %>%
select(hotel, arrival_date, everything())
dim(hotel_bookings)
[1] 119390 29
---
title: "Challenge 4 by Jinxia Niu"
author: "Jinxia Niu"
description: "More data wrangling: pivoting"
date: "03/25/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- hotel_bookings
- Jinxia Niu
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
library(tidyverse)
hotel_bookings<- read.csv("_data/hotel_bookings.csv")
dim(hotel_bookings)
head(hotel_bookings,10)
```
### Briefly describe the data
It's a hotel booking data set of 32 variables,119390 observations.There are several redundant variables, especially the arrival date_year,arrival_date_month and arrival_date_day_of_the_month ones.
## Identify variables that need to be mutated
We need to use mutate() and make_date() from the lubridate package to create just one date format. In addition, because the arrival-month is currently shown as characters(eg.July, August) instead of numbers, we also need to use case_when() from the dplyr package to convert them first into numbers.
##Note:The case_when() function from the dplyr package in R can be used to create new variables from existing variables.
This function uses the following basic syntax:
library(dplyr)
df %>%
mutate(new_var = case_when(var1 < 15 ~ 'low',
var2 < 25 ~ 'med',
TRUE ~ 'high'))
Note that TRUE is equivalent to an “else” statement.
```{r}
library(dplyr)
library(lubridate)
hotel_bookings <- hotel_bookings %>%
mutate(month = case_when(
arrival_date_month == "January" ~ 1,
arrival_date_month == "February" ~ 2,
arrival_date_month == "March" ~ 3,
arrival_date_month == "April" ~ 4,
arrival_date_month == "May" ~ 5,
arrival_date_month == "June" ~ 6,
arrival_date_month == "July" ~ 7,
arrival_date_month == "August" ~ 8,
arrival_date_month == "September" ~ 9,
arrival_date_month == "October" ~ 10,
arrival_date_month == "November" ~ 11,
arrival_date_month == "December" ~ 12,
)) %>%
mutate(arrival_date = make_date(arrival_date_year, month, arrival_date_day_of_month)) %>%
select(-c(arrival_date_year, arrival_date_month, arrival_date_day_of_month, month, arrival_date_week_number)) %>%
select(hotel, arrival_date, everything())
dim(hotel_bookings)
```