DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4 XIaoyanHu

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Tidy Data (as needed)
  • Identify variables that need to be mutated

Challenge 4 XIaoyanHu

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
Author

Xiaoyan Hu

Published

October 4, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
# read in data.
# hotel booking = data 1
# Fed funds rate = data 2
data1<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/hotel_bookings.csv")
Error in file(file, "rt"): cannot open the connection
Code
data2<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/FedFundsRate.csv")
Error in file(file, "rt"): cannot open the connection

Briefly describe the data

Hotel booking data including

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here. - data1 is not tidy

Code
# preview on each data
# get dimension
# get column name 
head(data1)
Error in head(data1): object 'data1' not found
Code
head(data2)
Error in head(data2): object 'data2' not found
Code
dim(data1)
Error in eval(expr, envir, enclos): object 'data1' not found
Code
colnames(data1)
Error in is.data.frame(x): object 'data1' not found
Code
summary(data1)
Error in summary(data1): object 'data1' not found
Code
dim(data2)
Error in eval(expr, envir, enclos): object 'data2' not found
Code
colnames(data2)
Error in is.data.frame(x): object 'data2' not found
Code
summarize(data2, max("year"), min("year"), mean("Unemlpoyment.Rate"))
Error in summarize(data2, max("year"), min("year"), mean("Unemlpoyment.Rate")): object 'data2' not found
Code
summarize(data2, max(as.numeric('year')), min(as.numeric("year")), mean(as.numeric("Unemlpoyment.Rate")))
Error in summarize(data2, max(as.numeric("year")), min(as.numeric("year")), : object 'data2' not found

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

Code
data1 <-mutate(data1, "arrival_date_year" = "year")
Error in mutate(data1, arrival_date_year = "year"): object 'data1' not found
Code
head(data1)
Error in head(data1): object 'data1' not found

Any additional comments?

Source Code
---
title: "Challenge 4 XIaoyanHu"
author: "Xiaoyan Hu"
desription: "More data wrangling: pivoting"
date: "10/04/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - abc_poll
  - eggs
  - fed_rates
  - hotel_bookings
  - debt
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

-   abc_poll.csv ⭐
-   poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
-   FedFundsRate.csv⭐⭐⭐
-   hotel_bookings.csv⭐⭐⭐⭐
-   debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

```{r}
# read in data.
# hotel booking = data 1
# Fed funds rate = data 2
data1<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/hotel_bookings.csv")
data2<- read.csv("/Users/cassie199/Desktop/22fall/DACSS601/601_Fall_2022/posts/_data/FedFundsRate.csv")



```

### Briefly describe the data
Hotel booking data including 

## Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
 - data1 is not tidy

```{r}
# preview on each data
# get dimension
# get column name 
head(data1)
head(data2)
dim(data1)
colnames(data1)
summary(data1)

dim(data2)
colnames(data2)

summarize(data2, max("year"), min("year"), mean("Unemlpoyment.Rate"))
summarize(data2, max(as.numeric('year')), min(as.numeric("year")), mean(as.numeric("Unemlpoyment.Rate")))

```

Any additional comments?

## Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

```{r}
data1 <-mutate(data1, "arrival_date_year" = "year")
head(data1)
```

Any additional comments?