Code
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Courtney Naughton
September 25, 2022
Challenge #3
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
#   ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
#   ⁸`Chicken - Layers`This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.
Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.
We expect a new dataframe to have \(9*16 = 144\) rows x \(3\) columns.
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹`IPCC Area`, ²`Cattle - dairy`, ³`Cattle - non-dairy`, ⁴Buffaloes,
#   ⁵`Swine - market`, ⁶`Swine - breeding`, ⁷`Chicken - Broilers`,
#   ⁸`Chicken - Layers`[1] 9[1] 17[1] 144[1] 144With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.
animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy",  "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
                 names_to="Animal_type",
                 values_to = "Weight")
animal_weights# A tibble: 144 × 3
   `IPCC Area`         Animal_type        Weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rowsRead in one (or more) of the following datasets, using the correct R package and command.
This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category.
Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.
We expect a new dataframe to have \(120*(6-2) = 480\) rows x \(4\) columns.
# A tibble: 120 × 6
   month      year large_half_dozen large_dozen extra_large_half_dozen extra_l…¹
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>     <dbl>
 1 January    2004             126         230                    132       230 
 2 February   2004             128.        226.                   134.      230 
 3 March      2004             131         225                    137       230 
 4 April      2004             131         225                    137       234.
 5 May        2004             131         225                    137       236 
 6 June       2004             134.        231.                   137       241 
 7 July       2004             134.        234.                   137       241 
 8 August     2004             134.        234.                   137       241 
 9 September  2004             130.        234.                   136.      241 
10 October    2004             128.        234.                   136.      241 
# … with 110 more rows, and abbreviated variable name ¹extra_large_dozen[1] 120[1] 6[1] 480[1] 480With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.
#Renaming the column names 
eggs2<-rename(eggs,
        "Large Half Dozen" = large_half_dozen, 
       "Large Dozen" =  large_dozen,
       "Extra Large Half Dozen"= extra_large_half_dozen, 
      "Extra Large Dozen" =  extra_large_dozen )
eggs2%>% 
 pivot_longer(
   cols = ends_with("Dozen"),
   names_to = "Category",
   values_to = "Cost100"
  )# A tibble: 480 × 4
   month     year Category               Cost100
   <chr>    <dbl> <chr>                    <dbl>
 1 January   2004 Large Half Dozen          126 
 2 January   2004 Large Dozen               230 
 3 January   2004 Extra Large Half Dozen    132 
 4 January   2004 Extra Large Dozen         230 
 5 February  2004 Large Half Dozen          128.
 6 February  2004 Large Dozen               226.
 7 February  2004 Extra Large Half Dozen    134.
 8 February  2004 Extra Large Dozen         230 
 9 March     2004 Large Half Dozen          131 
10 March     2004 Large Dozen               225 
# … with 470 more rows---
title: "Challenge 3 Naughton"
author: "Courtney Naughton"
desription: "Tidy Data: Pivoting"
date: "09/25/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - eggs
  - Courtney Naughton
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Challenge #3
::: panel-tabset
## Animal Weights
Read in one (or more) of the following datasets, using the correct R package and command.
-   animal_weights.csv ⭐
```{r}
animal_weights<-read_csv("_data/animal_weight.csv")
animal_weights
```
### Briefly describe the data
This dataset has 17 columns and only 9 rows. The columns are IPCC Area, Cattle - dairy, Cattle -non-dairy, Buffaloes, Swine - market, Swine- breeding, Chicken - broilers,Chicken layers, Ducks, Turkeys, Sheep, Goats, Horses, Asses, Mules, Camels, and Llamas. It would make more sense to have only 3 columns: Area, Animal Type, and Weight.
### Anticipate the End Result
Our original data set was 9 rows by 17 variables. Our new data will only have 3 variables so we expect 9*(17-1) = 144 rows by 3 columns.
### Challenge: Describe the final dimensions
We expect a new dataframe to have $9*16 = 144$ rows x $3$ columns.
```{r}
#| tbl-cap: Animal
animal_weights<-tibble(animal_weights)
animal_weights
#existing rows/cases
nrow(animal_weights)
#existing columns/cases
ncol(animal_weights)
#expected rows/cases
nrow(animal_weights) * (ncol(animal_weights)-1)
# expected columns 
144
```
### Challenge: Pivot the Chosen Data
With the pivoted data, each case is an observation of the type of animal, the area it comes from, and its weight.
```{r}
#| tbl-cap: Pivoted Example
animal_weights<-pivot_longer(animal_weights, col = c("Cattle - dairy", "Cattle - non-dairy",  "Buffaloes", "Swine - market","Swine - breeding","Chicken - Broilers", "Chicken - Layers","Ducks","Turkeys","Sheep","Goats", "Horses","Asses","Mules", "Camels","Llamas"),
                 names_to="Animal_type",
                 values_to = "Weight")
animal_weights
```
## Eggs
Read in one (or more) of the following datasets, using the correct R package and command.
-   eggs_tidy.csv ⭐⭐
```{r}
eggs<-read_csv("_data/eggs_tidy.csv")
```
### Briefly describe the data
This dataset has 6 columns and 120 rows. The columns are Month, Year, Large_half_dozen, Large_dozen, Extra_Large_Half_Dozen, Extra_Large_Dozen. This data is taken from every month from 2004 to 2013. I believe that this is tracking the average monthly cost of egg quantity. For example, in May 2004, a large half dozen carton of eggs cost $1.31. Rather than showing the average cost for each quantity, it would make more sense for one entry to have the month, year, the cost and the egg quantity category. 
### Anticipate the End Result
Our original data set was 6 columns by 120 rows. Our new data will have 4 variables so we expect 120*(6-2) = 480 rows by 4 columns.
### Challenge: Describe the final dimensions
We expect a new dataframe to have $120*(6-2) = 480$ rows x $4$ columns.
```{r}
#| tbl-cap: Animal
eggs<-tibble(eggs)
eggs
#existing rows/cases
nrow(eggs)
#existing columns/cases
ncol(eggs)
#expected rows/cases
nrow(eggs) * (ncol(eggs)-2)
# expected columns 
480
```
### Challenge: Pivot the Chosen Data
With the pivoted data, each case is an observation of the cost of eggs given a quantity category (Large Half Dozen, Large Dozen, Extra Large Half Dozen, Extra Large Dozen) in a specific month of a year from 2004 to 2013.
```{r}
#| tbl-cap: Pivoted Example2
#Renaming the column names 
eggs2<-rename(eggs,
        "Large Half Dozen" = large_half_dozen, 
       "Large Dozen" =  large_dozen,
       "Extra Large Half Dozen"= extra_large_half_dozen, 
      "Extra Large Dozen" =  extra_large_dozen )
eggs2%>% 
 pivot_longer(
   cols = ends_with("Dozen"),
   names_to = "Category",
   values_to = "Cost100"
  )
#I am then trying to divide the Cost column by 100 to get the average cost of egg cartons.
#eggs2<- mutate(eggs2,
              # Cost = Cost100 / 100 )
#eggs2
```
:::