DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 3

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in data
    • Briefly describe the data
  • Anticipate the End Result
  • Pivot the Data

Challenge 3

  • Show All Code
  • Hide All Code

  • View Source
challenge_3
animal_weights
eggs
australian_marriage
usa_households
sce_labor
Author

Gayan udugama

Published

August 17, 2022

Code
library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • animal_weights.csv ⭐
  • eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
  • australian_marriage*.xls ⭐⭐⭐
  • USA Households*.xlsx ⭐⭐⭐⭐
  • sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
Code
animal_weights <-  read_csv("_data/animal_weight.csv")
animal_weights
# A tibble: 9 × 17
  IPCC A…¹ Cattl…² Cattl…³ Buffa…⁴ Swine…⁵ Swine…⁶ Chick…⁷ Chick…⁸ Ducks Turkeys
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 Indian …     275     110     295      28      28     0.9     1.8   2.7     6.8
2 Eastern…     550     391     380      50     180     0.9     1.8   2.7     6.8
3 Africa       275     173     380      28      28     0.9     1.8   2.7     6.8
4 Oceania      500     330     380      45     180     0.9     1.8   2.7     6.8
5 Western…     600     420     380      50     198     0.9     1.8   2.7     6.8
6 Latin A…     400     305     380      28      28     0.9     1.8   2.7     6.8
7 Asia         350     391     380      50     180     0.9     1.8   2.7     6.8
8 Middle …     275     173     380      28      28     0.9     1.8   2.7     6.8
9 Norther…     604     389     380      46     198     0.9     1.8   2.7     6.8
# … with 7 more variables: Sheep <dbl>, Goats <dbl>, Horses <dbl>, Asses <dbl>,
#   Mules <dbl>, Camels <dbl>, Llamas <dbl>, and abbreviated variable names
#   ¹​`IPCC Area`, ²​`Cattle - dairy`, ³​`Cattle - non-dairy`, ⁴​Buffaloes,
#   ⁵​`Swine - market`, ⁶​`Swine - breeding`, ⁷​`Chicken - Broilers`,
#   ⁸​`Chicken - Layers`
Code
animal_weights %>% 
  summary()
  IPCC Area         Cattle - dairy  Cattle - non-dairy   Buffaloes    
 Length:9           Min.   :275.0   Min.   :110        Min.   :295.0  
 Class :character   1st Qu.:275.0   1st Qu.:173        1st Qu.:380.0  
 Mode  :character   Median :400.0   Median :330        Median :380.0  
                    Mean   :425.4   Mean   :298        Mean   :370.6  
                    3rd Qu.:550.0   3rd Qu.:391        3rd Qu.:380.0  
                    Max.   :604.0   Max.   :420        Max.   :380.0  
 Swine - market  Swine - breeding Chicken - Broilers Chicken - Layers
 Min.   :28.00   Min.   : 28.0    Min.   :0.9        Min.   :1.8     
 1st Qu.:28.00   1st Qu.: 28.0    1st Qu.:0.9        1st Qu.:1.8     
 Median :45.00   Median :180.0    Median :0.9        Median :1.8     
 Mean   :39.22   Mean   :116.4    Mean   :0.9        Mean   :1.8     
 3rd Qu.:50.00   3rd Qu.:180.0    3rd Qu.:0.9        3rd Qu.:1.8     
 Max.   :50.00   Max.   :198.0    Max.   :0.9        Max.   :1.8     
     Ducks        Turkeys        Sheep           Goats           Horses     
 Min.   :2.7   Min.   :6.8   Min.   :28.00   Min.   :30.00   Min.   :238.0  
 1st Qu.:2.7   1st Qu.:6.8   1st Qu.:28.00   1st Qu.:30.00   1st Qu.:238.0  
 Median :2.7   Median :6.8   Median :48.50   Median :38.50   Median :377.0  
 Mean   :2.7   Mean   :6.8   Mean   :39.39   Mean   :34.72   Mean   :315.2  
 3rd Qu.:2.7   3rd Qu.:6.8   3rd Qu.:48.50   3rd Qu.:38.50   3rd Qu.:377.0  
 Max.   :2.7   Max.   :6.8   Max.   :48.50   Max.   :38.50   Max.   :377.0  
     Asses         Mules         Camels        Llamas   
 Min.   :130   Min.   :130   Min.   :217   Min.   :217  
 1st Qu.:130   1st Qu.:130   1st Qu.:217   1st Qu.:217  
 Median :130   Median :130   Median :217   Median :217  
 Mean   :130   Mean   :130   Mean   :217   Mean   :217  
 3rd Qu.:130   3rd Qu.:130   3rd Qu.:217   3rd Qu.:217  
 Max.   :130   Max.   :130   Max.   :217   Max.   :217  
Code
 railroad<-read_excel("_data/StateCounty2012.xls",
                     skip = 4,
                     col_names= c("state", "delete",  "county",
                                  "delete", "employees")) %>% 
  select(!contains("delete"))%>%
  filter(!str_detect(state, "Total"))

railroad<-head(railroad, -2)%>%
  mutate(county = ifelse(state=="CANADA", "CANADA", county))

railroad
# A tibble: 2,931 × 3
   state county               employees
   <chr> <chr>                    <dbl>
 1 AE    APO                          2
 2 AK    ANCHORAGE                    7
 3 AK    FAIRBANKS NORTH STAR         2
 4 AK    JUNEAU                       3
 5 AK    MATANUSKA-SUSITNA            2
 6 AK    SITKA                        1
 7 AK    SKAGWAY MUNICIPALITY        88
 8 AL    AUTAUGA                    102
 9 AL    BALDWIN                    143
10 AL    BARBOUR                      1
# … with 2,921 more rows

Briefly describe the data

This data set contains information on different types on animals in 9 different parts of the world.

Anticipate the End Result

Code
#existing rows/cases
nrow(animal_weights)
[1] 9
Code
#existing columns/cases
ncol(animal_weights)
[1] 17

There are 9 locations around the world and 16 animals. so when pivoted we should expect 144 rows in the new table. There should be 3 columns. IPCC area, farm_animal and weight

Pivot the Data

Code
animal_weight_long <-pivot_longer(animal_weights, col = -`IPCC Area`,
                 names_to="farm_animal",
                 values_to = "weight")
animal_weight_long
# A tibble: 144 × 3
   `IPCC Area`         farm_animal        weight
   <chr>               <chr>               <dbl>
 1 Indian Subcontinent Cattle - dairy      275  
 2 Indian Subcontinent Cattle - non-dairy  110  
 3 Indian Subcontinent Buffaloes           295  
 4 Indian Subcontinent Swine - market       28  
 5 Indian Subcontinent Swine - breeding     28  
 6 Indian Subcontinent Chicken - Broilers    0.9
 7 Indian Subcontinent Chicken - Layers      1.8
 8 Indian Subcontinent Ducks                 2.7
 9 Indian Subcontinent Turkeys               6.8
10 Indian Subcontinent Sheep                28  
# … with 134 more rows

This confirms our previous estimation of our results.

Source Code
---
title: "Challenge 3"
author: "Gayan udugama"
desription: "Tidy Data: Pivoting"
date: "08/17/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_3
  - animal_weights
  - eggs
  - australian_marriage
  - usa_households
  - sce_labor
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to:

1.  read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2.  identify what needs to be done to tidy the current data
3.  anticipate the shape of pivoted data
4.  pivot the data into tidy format using `pivot_longer`

## Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

-   animal_weights.csv ⭐
-   eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
-   australian_marriage\*.xls ⭐⭐⭐
-   USA Households\*.xlsx ⭐⭐⭐⭐
-   sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟

```{r}
animal_weights <-  read_csv("_data/animal_weight.csv")
animal_weights
```
```{r}
animal_weights %>% 
  summary()
```
```{r}
 railroad<-read_excel("_data/StateCounty2012.xls",
                     skip = 4,
                     col_names= c("state", "delete",  "county",
                                  "delete", "employees")) %>% 
  select(!contains("delete"))%>%
  filter(!str_detect(state, "Total"))

railroad<-head(railroad, -2)%>%
  mutate(county = ifelse(state=="CANADA", "CANADA", county))

railroad
```

### Briefly describe the data

This data set contains information on different types on animals in 9 different parts of the world. 

## Anticipate the End Result

```{r}
#existing rows/cases
nrow(animal_weights)

#existing columns/cases
ncol(animal_weights)
```

There are 9 locations around the world and 16 animals. so when pivoted we should expect 144 rows in the new table. There should be 3 columns. IPCC area, farm_animal and weight 


## Pivot the Data

```{r}
animal_weight_long <-pivot_longer(animal_weights, col = -`IPCC Area`,
                 names_to="farm_animal",
                 values_to = "weight")
animal_weight_long
```


This confirms our previous estimation of our results.