Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Rahul Somu
April 10, 2022
Read excel and load it to a dataframe
df_ausmarriage <- read_excel("_data/australian_marriage_law_postal_survey_2017_-_response_final.xls",, sheet= "Table 2", skip=7,col_names = c("State", "Yes Count", "Yes %", "No Count", "No %", "Clear Total Count", "Clear Count %", "Blank", "Clear Count", "Clear %", "Not clear Count", "Not clear %", "Non-response Count", "Non-response %", "Total_Count", "Total %"))
Create a column with Division for the respective state
df_ausmarriage_with_div <- df_ausmarriage %>%
select(!c("Blank", "Clear Count", "Clear Total Count") & !contains("%")) %>%
filter(!grepl("(Total)", State)) %>%
mutate(Division = case_when(str_ends(State, "Divisions") ~ State,
TRUE ~ NA_character_ )) %>%
fill(Division, .direction = "down") %>%
filter(!is.na(`Total_Count`))
df_ausmarriage_with_div
# A tibble: 150 × 7
State `Yes Count` `No Count` `Not clear Count` Non-resp…¹ Total…² Divis…³
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Banks 37736 46343 247 20928 105254 New So…
2 Barton 37153 47984 226 24008 109371 New So…
3 Bennelong 42943 43215 244 19973 106375 New So…
4 Berowra 48471 40369 212 16038 105090 New So…
5 Blaxland 20406 57926 220 25883 104435 New So…
6 Bradfield 53681 34927 202 17261 106071 New So…
7 Calare 54091 35779 285 25342 115497 New So…
8 Chifley 32871 46702 263 28180 108016 New So…
9 Cook 47505 38804 229 18713 105251 New So…
10 Cowper 57493 38317 315 25197 121322 New So…
# … with 140 more rows, and abbreviated variable names ¹`Non-response Count`,
# ²Total_Count, ³Division
Document your work here.
Pivot the data to help fliter based on Divison/state with yes/No count.
# A tibble: 300 × 6
State Total_Count Division Response Count Percent
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 Banks 105254 New South Wales Divisions Yes Count 37736 0.359
2 Banks 105254 New South Wales Divisions No Count 46343 0.440
3 Barton 109371 New South Wales Divisions Yes Count 37153 0.340
4 Barton 109371 New South Wales Divisions No Count 47984 0.439
5 Bennelong 106375 New South Wales Divisions Yes Count 42943 0.404
6 Bennelong 106375 New South Wales Divisions No Count 43215 0.406
7 Berowra 105090 New South Wales Divisions Yes Count 48471 0.461
8 Berowra 105090 New South Wales Divisions No Count 40369 0.384
9 Blaxland 104435 New South Wales Divisions Yes Count 20406 0.195
10 Blaxland 104435 New South Wales Divisions No Count 57926 0.555
# … with 290 more rows
The data has been transformed based on the yes/no count to help filter for each category and help in analysis.
---
title: "Challenge 3"
author: "Rahul Somu"
description: "Tidy Data: Pivoting"
date: "04/10/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3
- animal_weights
- eggs
- australian_marriage
- usa_households
- sce_labor
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Read excel and load it to a dataframe
```{r}
df_ausmarriage <- read_excel("_data/australian_marriage_law_postal_survey_2017_-_response_final.xls",, sheet= "Table 2", skip=7,col_names = c("State", "Yes Count", "Yes %", "No Count", "No %", "Clear Total Count", "Clear Count %", "Blank", "Clear Count", "Clear %", "Not clear Count", "Not clear %", "Non-response Count", "Non-response %", "Total_Count", "Total %"))
```
Create a column with Division for the respective state
```{r}
#| tbl-cap: Example
df_ausmarriage_with_div <- df_ausmarriage %>%
select(!c("Blank", "Clear Count", "Clear Total Count") & !contains("%")) %>%
filter(!grepl("(Total)", State)) %>%
mutate(Division = case_when(str_ends(State, "Divisions") ~ State,
TRUE ~ NA_character_ )) %>%
fill(Division, .direction = "down") %>%
filter(!is.na(`Total_Count`))
df_ausmarriage_with_div
df_select <- select(df_ausmarriage_with_div, "State", "Yes Count", "No Count", "Total_Count","Division")
```
Document your work here.
Pivot the data to help fliter based on Divison/state with yes/No count.
```{r}
df_extend <- df_select %>%
pivot_longer(col = c("Yes Count", "No Count"),
names_to = "Response",
values_to = "Count") %>%
mutate("Percent" = Count/Total_Count)
print(df_extend)
```
The data has been transformed based on the yes/no count to help filter for each category and help in analysis.