Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Meredith Rolfe
August 18, 2022
Today’s challenge is to:
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 600 × 4
Product Year Month Price_Dollar
<chr> <dbl> <chr> <dbl>
1 Whole 2013 January 2.38
2 Whole 2013 February 2.38
3 Whole 2013 March 2.38
4 Whole 2013 April 2.38
5 Whole 2013 May 2.38
6 Whole 2013 June 2.38
7 Whole 2013 July 2.38
8 Whole 2013 August 2.38
9 Whole 2013 September 2.38
10 Whole 2013 October 2.38
# … with 590 more rows
The dataset contains information about products, including their prices in US dollars, the corresponding year, and month. It consists of 600 rows and 4 columns: Product, Year, Month, and Price_Dollar. The minimum and maximum years recorded in the data are 2004 and 2013, respectively. Additionally, the minimum and maximum prices observed in the dataset are 1.935 and 7.037 US dollars, respectively. It’s important to note that the Price_Dollar column may have missing values denoted by NA’s.
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
The data remains tidy despite the presence of NA values in the Price_Dollar column. Tidy data principles focus on having each variable in its own column, each observation in its own row, and each value in its own cell. As long as these principles are met, the data can be considered tidy. In this particular case, the data satisfies these requirements, so no further steps are necessary to ensure tidiness.
Any additional comments?
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
# A tibble: 600 × 5
Product Year Month Price_Dollar Date
<chr> <dbl> <chr> <dbl> <date>
1 Whole 2013 January 2.38 2013-01-01
2 Whole 2013 February 2.38 2013-02-01
3 Whole 2013 March 2.38 2013-03-01
4 Whole 2013 April 2.38 2013-04-01
5 Whole 2013 May 2.38 2013-05-01
6 Whole 2013 June 2.38 2013-06-01
7 Whole 2013 July 2.38 2013-07-01
8 Whole 2013 August 2.38 2013-08-01
9 Whole 2013 September 2.38 2013-09-01
10 Whole 2013 October 2.38 2013-10-01
# … with 590 more rows
Any additional comments?
Creating a “Date” variable by combining the “Year” and “Month” variables in a suitable format offers several benefits. It helps effortless time-series analysis and visualization. When the data is organized as a time series, we can readily plot the data and gain insights into trends and patterns over time. Further it also simplifies the data structure. Rather than dealing with multiple time-related variables (i.e., “Year” and “Month”), consolidating them into a single “Date” variable streamlines subsequent analysis and visualization. This approach enhances clarity and efficiency when working with the temporal aspect of the data.
---
title: "Challenge 4_PriyankaThatikonda"
author: "Meredith Rolfe"
description: "More data wrangling: pivoting"
date: "08/18/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_4
- abc_poll
- eggs
- fed_rates
- hotel_bookings
- debt
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1) read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2) tidy data (as needed, including sanity checks)
3) identify variables that need to be mutated
4) mutate variables and sanity check all mutations
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- abc_poll.csv ⭐
- poultry_tidy.xlsx or organiceggpoultry.xls⭐⭐
- FedFundsRate.csv⭐⭐⭐
- hotel_bookings.csv⭐⭐⭐⭐
- debt_in_trillions.xlsx ⭐⭐⭐⭐⭐
```{r}
data <-readxl::read_excel("_data/poultry_tidy.xlsx")
data
```
### Briefly describe the data
The dataset contains information about products, including their prices in US dollars, the corresponding year, and month. It consists of 600 rows and 4 columns: Product, Year, Month, and Price_Dollar. The minimum and maximum years recorded in the data are 2004 and 2013, respectively. Additionally, the minimum and maximum prices observed in the dataset are 1.935 and 7.037 US dollars, respectively. It's important to note that the Price_Dollar column may have missing values denoted by NA's.
## Tidy Data (as needed)
Is your data already tidy, or is there work to be done? Be sure to anticipate your end result to provide a sanity check, and document your work here.
The data remains tidy despite the presence of NA values in the Price_Dollar column. Tidy data principles focus on having each variable in its own column, each observation in its own row, and each value in its own cell. As long as these principles are met, the data can be considered tidy. In this particular case, the data satisfies these requirements, so no further steps are necessary to ensure tidiness.
Any additional comments?
## Identify variables that need to be mutated
Are there any variables that require mutation to be usable in your analysis stream? For example, are all time variables correctly coded as dates? Are all string variables reduced and cleaned to sensible categories? Do you need to turn any variables into factors and reorder for ease of graphics and visualization?
Document your work here.
```{r}
data1 <- data %>%
mutate(Date = ymd(paste(Year, Month, "01", sep = "-")))
data1
```
Any additional comments?
Creating a "Date" variable by combining the "Year" and "Month" variables in a suitable format offers several benefits. It helps effortless time-series analysis and visualization. When the data is organized as a time series, we can readily plot the data and gain insights into trends and patterns over time. Further it also simplifies the data structure. Rather than dealing with multiple time-related variables (i.e., "Year" and "Month"), consolidating them into a single "Date" variable streamlines subsequent analysis and visualization. This approach enhances clarity and efficiency when working with the temporal aspect of the data.