Challenge 3 Instructions

challenge_3

Author

Kim Darkenwald

Published

August 17, 2022

Code

library(tidyverse)
library(readr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
identify what needs to be done to tidy the current data
anticipate the shape of pivoted data
pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

animal_weights.csv ⭐
eggs_tidy.csv ⭐⭐ or organicpoultry.xls ⭐⭐⭐
australian_marriage*.xlsx ⭐⭐⭐
USA Households*.xlsx ⭐⭐⭐⭐
sce_labor_chart_data_public.csv 🌟🌟🌟🌟🌟

Code

animal_weight<-read_csv("_data/animal_weight.csv",
                        show_col_types = FALSE)
view(animal_weight)
dim(animal_weight)

[1]  9 17

Briefly describe the data

As indicated in our data,for the most part, animals share similar weights around regions of the globe. However, when it comes to buffalo, cattle, and swine, there are distinct differences in weight. Animals of these categories in particular appear to be much larger in weight in the Northern American and European regions while the regions of the Middle East, Africa, and the Indian Subcontinent contain animals of significantly less weight.

I’m not sure why or how I would pivot this.

Example: find current and future data dimensions

Code

df<-tibble(country = rep(c("Mexico", "USA", "France"),2),
           year = rep(c(1980,1990), 3), 
           trade = rep(c("NAFTA", "NAFTA", "EU"),2),
           outgoing = rnorm(6, mean=1000, sd=500),
           incoming = rlogis(6, location=1000, 
                             scale = 400))
df

# A tibble: 6 × 5
  country  year trade outgoing incoming
  <chr>   <dbl> <chr>    <dbl>    <dbl>
1 Mexico   1980 NAFTA    1243.    -134.
2 USA      1990 NAFTA    1271.     666.
3 France   1980 EU        695.    1086.
4 Mexico   1990 NAFTA    1135.    1424.
5 USA      1980 NAFTA    1021.    1545.
6 France   1990 EU       1042.     900.

Code

#existing rows/cases
nrow(df)

[1] 6

Code

#existing columns/cases
ncol(df)

[1] 5

Code

#expected rows/cases
nrow(df) * (ncol(df)-3)

[1] 12

Code

# expected columns 
3 + 2

[1] 5

Or simple example has \(n = 6\) rows and \(k - 3 = 2\) variables being pivoted, so we expect a new dataframe to have \(n * 2 = 12\) rows x \(3 + 2 = 5\) columns.

Challenge: Describe the final dimensions

There are 9 rows and 17 columns, therefore, n = 9 and k =17. I do not see how I would pivot this. Is it because some of the columns have the same number of animals so you would eliminate them?

Code

df<-tibble(animal = rep(c("Cattle_Dairy", "Cattle_Nondairy", "Swine_Market",
                          "Swine_Breeding"),9))
df

# A tibble: 36 × 1
   animal         
   <chr>          
 1 Cattle_Dairy   
 2 Cattle_Nondairy
 3 Swine_Market   
 4 Swine_Breeding 
 5 Cattle_Dairy   
 6 Cattle_Nondairy
 7 Swine_Market   
 8 Swine_Breeding 
 9 Cattle_Dairy   
10 Cattle_Nondairy
# … with 26 more rows
# ℹ Use `print(n = ...)` to see more rows

Code

nrow(df)

[1] 36

Code

ncol(df)

[1] 1

Code

nrow(fd) * (ncol(df)-3)

Error in nrow(fd): object 'fd' not found

Example

Code

df<-pivot_longer(df, col = c(outgoing, incoming),
                 names_to="No Idea",
                 values_to = "Not Sure")

Error in `chr_as_locations()`:
! Can't subset columns that don't exist.
✖ Column `outgoing` doesn't exist.

Code

df

# A tibble: 36 × 1
   animal         
   <chr>          
 1 Cattle_Dairy   
 2 Cattle_Nondairy
 3 Swine_Market   
 4 Swine_Breeding 
 5 Cattle_Dairy   
 6 Cattle_Nondairy
 7 Swine_Market   
 8 Swine_Breeding 
 9 Cattle_Dairy   
10 Cattle_Nondairy
# … with 26 more rows
# ℹ Use `print(n = ...)` to see more rows

Yes, once it is pivoted long, our resulting data are \(12x5\) - exactly what we expected!

Challenge: Pivot the Chosen Data

Document your work here. What will a new “case” be once you have pivoted the data? How does it meet requirements for tidy data?

Any additional comments?