Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Xiaoyan Hu
October 4, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
Error in file(file, "rt"): cannot open the connection
Error in file(file, "rt"): cannot open the connection
Hotel booking data including
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here. - data1 is not tidy
Error in head(data1): object 'data1' not found
Error in head(data2): object 'data2' not found
Error in eval(expr, envir, enclos): object 'data1' not found
Error in is.data.frame(x): object 'data1' not found
Error in summary(data1): object 'data1' not found
Error in eval(expr, envir, enclos): object 'data2' not found
Error in is.data.frame(x): object 'data2' not found
Error in summarize(data2, max("year"), min("year"), mean("Unemlpoyment.Rate")): object 'data2' not found
Error in summarize(data2, max(as.numeric("year")), min(as.numeric("year")), : object 'data2' not found
Any additional comments?
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
Error in mutate(data1, arrival_date_year = "year"): object 'data1' not found
Error in head(data1): object 'data1' not found
Any additional comments?
---
title: "Challenge 4 XIaoyanHu"
author: "Xiaoyan Hu"
desription: "More data wrangling: pivoting"
date: "10/04/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- abc_poll
- eggs
- fed_rates
- hotel_bookings
- debt
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
# read in data.
# hotel booking = data 1
# Fed funds rate = data 2
data1<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/hotel_bookings.csv")
data2<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/FedFundsRate.csv")
```
### Briefly describe the data
Hotel booking data including
## Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
- data1 is not tidy
```{r}
# preview on each data
# get dimension
# get column name
head(data1)
head(data2)
dim(data1)
colnames(data1)
summary(data1)
dim(data2)
colnames(data2)
summarize(data2, max("year"), min("year"), mean("Unemlpoyment.Rate"))
summarize(data2, max(as.numeric('year')), min(as.numeric("year")), mean(as.numeric("Unemlpoyment.Rate")))
```
Any additional comments?
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
```{r}
data1 <-mutate(data1, "arrival_date_year" = "year")
head(data1)
```
Any additional comments?