DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 4 Instructions

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Tidy Data (as needed)
  • Identify variables that need to be mutated

Challenge 4 Instructions

  • Show All Code
  • Hide All Code

  • View Source
challenge_4
abc_poll
eggs
fed_rates
hotel_bookings
debt
Author

Meredith Rolfe

Published

August 18, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. tidy data (as needed, including sanity checks)
  3. identify variables that need to be mutated
  4. mutate variables and sanity check all mutations

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • abc_poll.csv ⭐
  • poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
  • FedFundsRate.csv⭐⭐⭐
  • hotel_bookings.csv⭐⭐⭐⭐
  • debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
Code
library(readr)
  eggs<- read_csv("_data/eggs_tidy.csv")
eggs
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹​extra_large_dozen
Code
eggs_long<- eggs_long%>%
  mutate(size = case_when(startsWith(eggType, "extra")~"extra_large",startsWith(eggType, "large")~"large"))
Error in mutate(., size = case_when(startsWith(eggType, "extra") ~ "extra_large", : object 'eggs_long' not found
Code
eggs_long
Error in eval(expr, envir, enclos): object 'eggs_long' not found
Code
#mutating variable to find average of eggs weight 
#grouping by year for average 
eggs%>%
  select(month,year,large_half_dozen,extra_large_half_dozen)%>%
  group_by(year)%>%
  mutate(average_extra_large_half_dozen=mean(extra_large_half_dozen))%>%
  mutate(average_large_half_dozen=mean(large_half_dozen))
# A tibble: 120 × 6
# Groups:   year [10]
   month      year large_half_dozen extra_large_half_dozen average_ext…¹ avera…²
   <chr>     <dbl>            <dbl>                  <dbl>         <dbl>   <dbl>
 1 January    2004             126                    132           136.    130.
 2 February   2004             128.                   134.          136.    130.
 3 March      2004             131                    137           136.    130.
 4 April      2004             131                    137           136.    130.
 5 May        2004             131                    137           136.    130.
 6 June       2004             134.                   137           136.    130.
 7 July       2004             134.                   137           136.    130.
 8 August     2004             134.                   137           136.    130.
 9 September  2004             130.                   136.          136.    130.
10 October    2004             128.                   136.          136.    130.
# … with 110 more rows, and abbreviated variable names
#   ¹​average_extra_large_half_dozen, ²​average_large_half_dozen

Briefly describe the data

Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

Any additional comments?

Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

Any additional comments?

Source Code
---
title: "Challenge 4 Instructions"
author: "Meredith Rolfe"
desription: "More data wrangling: pivoting"
date: "08/18/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_4
  - abc_poll
  - eggs
  - fed_rates
  - hotel_bookings
  - debt
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1)  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2)  tidy data (as needed, including sanity checks)
3)  identify variables that need to be mutated
4)  mutate variables and sanity check all mutations

## Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

-   abc_poll.csv ⭐
-   poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
-   FedFundsRate.csv⭐⭐⭐
-   hotel_bookings.csv⭐⭐⭐⭐
-   debt_in_trillions.xlsx ⭐⭐⭐⭐⭐

```{r}
library(readr)
  eggs<- read_csv("_data/eggs_tidy.csv")
eggs

eggs_long<- eggs_long%>%
  mutate(size = case_when(startsWith(eggType, "extra")~"extra_large",startsWith(eggType, "large")~"large"))
eggs_long

#mutating variable to find average of eggs weight 
#grouping by year for average 
eggs%>%
  select(month,year,large_half_dozen,extra_large_half_dozen)%>%
  group_by(year)%>%
  mutate(average_extra_large_half_dozen=mean(extra_large_half_dozen))%>%
  mutate(average_large_half_dozen=mean(large_half_dozen))


          
  
  
 

```

### Briefly describe the data

## Tidy Data (as needed)

Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.

```{r}
```

Any additional comments?

## Identify variables that need to be mutated

Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?

Document your work here.

```{r}
```

Any additional comments?