Code
library(tidyverse)
library(readr)
library(xlsx)
library(readxl)
library(dplyr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Linda Humphrey
August 22, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 904 × 10
Year Month Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
7 1955 1 1 NA NA NA 1.39 11.9 4.9 NA
8 1955 2 1 NA NA NA 1.29 NA 4.7 NA
9 1955 3 1 NA NA NA 1.35 NA 4.6 NA
10 1955 4 1 NA NA NA 1.43 6.7 4.7 NA
# … with 894 more rows, and abbreviated variable names
# ¹`Federal Funds Target Rate`, ²`Federal Funds Upper Target`,
# ³`Federal Funds Lower Target`, ⁴`Effective Federal Funds Rate`,
# ⁵`Real GDP (Percent Change)`, ⁶`Unemployment Rate`, ⁷`Inflation Rate`
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
# Tidy in the data
library(readr)
FedFundsRate <- read_csv("~/Desktop/601_Spring_2023/posts/_data/FedFundsRate.csv")
FedFundsRate %>% gather('1954','1955','1956','1957','1958','1959','1960','1961','1962','1963','1964','1965','1966','1967','1968','1969','1970','1971','1972','1973','1974','1975','1976','1977','1978','1979','1980','1981','1982','1983','1984','1985','1986','1987','1988','1989','1990','1991','1992','1993','1994','1995','1996','1997','1998','1999','2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022','2023', key = "Year")
Error in `gather()`:
! Can't subset columns that don't exist.
✖ Column `1955` doesn't exist.
# A tibble: 904 × 10
Year Month Day Federal F…¹ Feder…² Feder…³ Effec…⁴ Real …⁵ Unemp…⁶ Infla…⁷
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1954 7 1 NA NA NA 0.8 4.6 5.8 NA
2 1954 8 1 NA NA NA 1.22 NA 6 NA
3 1954 9 1 NA NA NA 1.06 NA 6.1 NA
4 1954 10 1 NA NA NA 0.85 8 5.7 NA
5 1954 11 1 NA NA NA 0.83 NA 5.3 NA
6 1954 12 1 NA NA NA 1.28 NA 5 NA
7 1955 1 1 NA NA NA 1.39 11.9 4.9 NA
8 1955 2 1 NA NA NA 1.29 NA 4.7 NA
9 1955 3 1 NA NA NA 1.35 NA 4.6 NA
10 1955 4 1 NA NA NA 1.43 6.7 4.7 NA
# … with 894 more rows, and abbreviated variable names
# ¹`Federal Funds Target Rate`, ²`Federal Funds Upper Target`,
# ³`Federal Funds Lower Target`, ⁴`Effective Federal Funds Rate`,
# ⁵`Real GDP (Percent Change)`, ⁶`Unemployment Rate`, ⁷`Inflation Rate`
Any additional comments?
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
Any additional comments?
---
title: "Challenge 4"
author: "Linda Humphrey"
description: "More data wrangling: pivoting"
date: "08/22/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4: hotel_bookings
- hotel_bookings: hotel_bookings.csv
- name: Linda Humphrey
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
library(tibble)
library(readxl)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
### Briefly describe the data
# The data below is a collection The hotel_bookings.
```{r}
# Reading in the data
library(readr)
hotel_bookings <- read_csv("~/Desktop/601_Spring_2023/posts/_data/hotel_bookings.csv")
hotel_bookings
```
## Tidy Data (as needed)
In tidyr, pivot_longer() will take hotel_bookings dataset from wide to long, changes the names to variables and values.
```{r}
# Tidy in the data
#Rename the column names that gather has provides
#Change key and value to variable and value.
library(readr)
hotel_bookings <- read_csv("~/Desktop/601_Spring_2023/posts/_data/hotel_bookings.csv")
gathered <- hotel_bookings %>%
pivot_longer(everything(), names_to = "variable", values_to = "value", values_drop_na = TRUE)
gathered
```
## Exploring data
Here we observe the first few rows of our data.
```{r}
library(readr)
hotel_bookings <- read_csv("~/Desktop/601_Spring_2023/posts/_data/hotel_bookings.csv")
head(hotel_bookings)
```
## Here we observe the summary statistics of each variable of our dataset.
```{r}
summary(hotel_bookings)
```
## Here we observe the structure of our dataset
```{R}
str(hotel_bookings)
```
## Analyzing data set with filter option
```{R}
library(lubridate)
hotel_bookings <- hotel_bookings %>%
filter(arrival_date_year == 2015)
```
## checking the missing values
```{R}
sum(is.na(hotel_bookings))
```
#Checking for duplicate values
```{R}
sum(duplicated(hotel_bookings))
```
#Checking for outliers or extreme values
```{R}
boxplot(hotel_bookings$arrival_date_day_of_month)
```
# Creating Line graph of hotel_bookings
```{R}
# Convert the year column to a date format
hotel_bookings$arrival_date_day_of_month <- as.Date(paste(hotel_bookings$arrival_date_year, "-01-01", sep = ""), format = "%Y-%m-%d")
# Filter data including years between 2015 - 2017
hotel_bookings <- hotel_bookings %>%
filter(arrival_date_year >= 2015 & arrival_date_year <= 2017)
```
# Wrangle hotel_bookings
```{R}
read_csv("~/Desktop/601_Spring_2023/posts/_data/hotel_bookings.csv",)
# Mutate data
hotel_bookings <- hotel_bookings %>%
mutate(year = as.Date(Date, "%m/%d/%Y"))
hotel_bookings <- hotel_bookings %>%
rename(year = Date, EFFR = `Effective Federal Funds Rate`)
hotel_bookings <- hotel_bookings %>%
mutate(Month = month(Date, label = TRUE), Year = year(Date))
```