DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 3 Naughton

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview

Challenge 3 Naughton

  • Show All Code
  • Hide All Code

  • View Source
challenge_3
animal_weights
eggs
Courtney Naughton
Author

Courtney Naughton

Published

September 25, 2022

Code
library(tidyverse)
library(dplyr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Challenge #3

  • Animal Weights
  • Eggs

Read in one (or more) of the following datasets, using the correct R package and command.

  • animal_weights.csv ⭐
Code
animal_weights<-read_csv("_data/animal_weight.csv")
animal_weights
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`

Briefly describe the data

This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.

Anticipate the End Result

Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.

Challenge: Describe the final dimensions

We expect a new dataframe to have \(9*16 = 144\) rows x \(3\) columns.

Code
animal_weights<-tibble(animal_weights)
animal_weights
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`
Code
#existing rows/cases
nrow(animal_weights)
[1] 9
Code
#existing columns/cases
ncol(animal_weights)
[1] 17
Code
#expected rows/cases
nrow(animal_weights) * (ncol(animal_weights)-1)
[1] 144
Code
# expected columns 
144
[1] 144

Challenge: Pivot the Chosen Data

With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.

Code
animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy",  "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
                 names_to="Animal_type",
                 values_to = "Weight")
animal_weights
# A tibble: 144 × 3
   `IPCC Area`         Animal_type        Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows

Read in one (or more) of the following datasets, using the correct R package and command.

  • eggs_tidy.csv ⭐⭐
Code
eggs<-read_csv("_data/eggs_tidy.csv")

Briefly describe the data

This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category.

Anticipate the End Result

Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.

Challenge: Describe the final dimensions

We expect a new dataframe to have \(120*(6-2) = 480\) rows x \(4\) columns.

Code
eggs<-tibble(eggs)
eggs
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹​extra_large_dozen
Code
#existing rows/cases
nrow(eggs)
[1] 120
Code
#existing columns/cases
ncol(eggs)
[1] 6
Code
#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)
[1] 480
Code
# expected columns 
480
[1] 480

Challenge: Pivot the Chosen Data

With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.

Code
#Renaming the column names 
eggs2<-rename(eggs,
        "Large Half Dozen" = large_half_dozen, 
       "Large Dozen" =  large_dozen,
       "Extra Large Half Dozen"= extra_large_half_dozen, 
      "Extra Large Dozen" =  extra_large_dozen )

eggs2%>% 
 pivot_longer(
   cols = ends_with("Dozen"),
   names_to = "Category",
   values_to = "Cost100"
  )
# A tibble: 480 × 4
   month     year Category               Cost100
   <chr>    <dbl> <chr>                    <dbl>
 1 January   2004 Large Half Dozen          126 
 2 January   2004 Large Dozen               230 
 3 January   2004 Extra Large Half Dozen    132 
 4 January   2004 Extra Large Dozen         230 
 5 February  2004 Large Half Dozen          128.
 6 February  2004 Large Dozen               226.
 7 February  2004 Extra Large Half Dozen    134.
 8 February  2004 Extra Large Dozen         230 
 9 March     2004 Large Half Dozen          131 
10 March     2004 Large Dozen               225 
# … with 470 more rows
Code
#I am then trying to divide the Cost column by 100 to get the average cost of egg cartons.
#eggs2<- mutate(eggs2,
              # Cost = Cost100 / 100 )
#eggs2
Source Code
---
title: "Challenge 3 Naughton"
author: "Courtney Naughton"
desription: "Tidy Data: Pivoting"
date: "09/25/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - eggs
  - Courtney Naughton

---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(dplyr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Challenge #3

::: panel-tabset
## Animal Weights

Read in one (or more) of the following datasets, using the correct R package and command.

-   animal_weights.csv ⭐

```{r}
animal_weights<-read_csv("_data/animal_weight.csv")
animal_weights

```

### Briefly describe the data

This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.

### Anticipate the End Result

Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.

### Challenge: Describe the final dimensions

We expect a new dataframe to have $9*16 = 144$ rows x $3$ columns.

```{r}
#| tbl-cap: Animal

animal_weights<-tibble(animal_weights)
animal_weights

#existing rows/cases
nrow(animal_weights)

#existing columns/cases
ncol(animal_weights)

#expected rows/cases
nrow(animal_weights) * (ncol(animal_weights)-1)

# expected columns 
144

```

### Challenge: Pivot the Chosen Data

With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.

```{r}
#| tbl-cap: Pivoted Example

animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy",  "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
                 names_to="Animal_type",
                 values_to = "Weight")
animal_weights

```
## Eggs

Read in one (or more) of the following datasets, using the correct R package and command.

-   eggs_tidy.csv ⭐⭐

```{r}
eggs<-read_csv("_data/eggs_tidy.csv")


```

### Briefly describe the data

This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category. 

### Anticipate the End Result

Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.

### Challenge: Describe the final dimensions

We expect a new dataframe to have $120*(6-2) = 480$ rows x $4$ columns.

```{r}
#| tbl-cap: Animal

eggs<-tibble(eggs)
eggs

#existing rows/cases
nrow(eggs)

#existing columns/cases
ncol(eggs)

#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)

# expected columns 
480

```

### Challenge: Pivot the Chosen Data

With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.

```{r}
#| tbl-cap: Pivoted Example2
#Renaming the column names 
eggs2<-rename(eggs,
        "Large Half Dozen" = large_half_dozen, 
       "Large Dozen" =  large_dozen,
       "Extra Large Half Dozen"= extra_large_half_dozen, 
      "Extra Large Dozen" =  extra_large_dozen )

eggs2%>% 
 pivot_longer(
   cols = ends_with("Dozen"),
   names_to = "Category",
   values_to = "Cost100"
  )
#I am then trying to divide the Cost column by 100 to get the average cost of egg cartons.
#eggs2<- mutate(eggs2,
              # Cost = Cost100 / 100 )
#eggs2

```

:::