Code
library(tidyverse)
library(readr)
library(xlsx)
library(readxl)
library(dplyr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Linda Humphrey
March 6, 2023
Today’s challenge is to:
pivot_longer
Read in one (or more) of the following datasets, using the correct R package and command.
# A tibble: 9 × 7
...1 no....2 `%...3` no....4 `%...5` no....6 `%...7`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 New South Wales 2374362 57.8 1736838 42.2 4.11e6 100
2 Victoria 2145629 64.9 1161098 35.1 3.31e6 100
3 Queensland 1487060 60.7 961015 39.3 2.45e6 100
4 South Australia 592528 62.5 356247 37.5 9.49e5 100
5 Western Australia 801575 63.7 455924 36.3 1.26e6 100
6 Tasmania 191948 63.6 109655 36.4 3.02e5 100
7 Northern Territory(b) 48686 60.6 31690 39.4 8.04e4 100
8 Australian Capital Territory(… 175459 74 61520 26 2.37e5 100
9 Australia 7817247 61.6 4873987 38.4 1.27e7 100
I renamed all columns to be clear, created a response variable with ‘yes’ and ‘no’ values, and recalculated the percent variable and pivot the data longer.
# A tibble: 18 × 5
territory total response number percent
<chr> <dbl> <chr> <dbl> <dbl>
1 New South Wales 4111200 yes 2374362 57.8
2 New South Wales 4111200 no 1736838 42.2
3 Victoria 3306727 yes 2145629 64.9
4 Victoria 3306727 no 1161098 35.1
5 Queensland 2448075 yes 1487060 60.7
6 Queensland 2448075 no 961015 39.3
7 South Australia 948775 yes 592528 62.5
8 South Australia 948775 no 356247 37.5
9 Western Australia 1257499 yes 801575 63.7
10 Western Australia 1257499 no 455924 36.3
11 Tasmania 301603 yes 191948 63.6
12 Tasmania 301603 no 109655 36.4
13 Northern Territory(b) 80376 yes 48686 60.6
14 Northern Territory(b) 80376 no 31690 39.4
15 Australian Capital Territory(c) 236979 yes 175459 74.0
16 Australian Capital Territory(c) 236979 no 61520 26.0
17 Australia 12691234 yes 7817247 61.6
18 Australia 12691234 no 4873987 38.4
I chose to delete the percentage columns from the original data and recalculate them later to avoid messiness.
# A tibble: 18 × 5
territory response number total percent
<chr> <chr> <dbl> <dbl> <dbl>
1 New South Wales yes 2374362 4111200 57.8
2 New South Wales no 1736838 4111200 42.2
3 Victoria yes 2145629 3306727 64.9
4 Victoria no 1161098 3306727 35.1
5 Queensland yes 1487060 2448075 60.7
6 Queensland no 961015 2448075 39.3
7 South Australia yes 592528 948775 62.5
8 South Australia no 356247 948775 37.5
9 Western Australia yes 801575 1257499 63.7
10 Western Australia no 455924 1257499 36.3
11 Tasmania yes 191948 301603 63.6
12 Tasmania no 109655 301603 36.4
13 Northern Territory yes 48686 80376 60.6
14 Northern Territory no 31690 80376 39.4
15 Australian Capital Territory yes 175459 236979 74.0
16 Australian Capital Territory no 61520 236979 26.0
17 Australia yes 7817247 12691234 61.6
18 Australia no 4873987 12691234 38.4
The code arranges the votes in order of greatest to least percent of yes votes using filter, select, and arrange.
# A tibble: 9 × 3
territory percent total
<chr> <dbl> <dbl>
1 Australian Capital Territory 74.0 236979
2 Victoria 64.9 3306727
3 Western Australia 63.7 1257499
4 Tasmania 63.6 301603
5 South Australia 62.5 948775
6 Australia 61.6 12691234
7 Queensland 60.7 2448075
8 Northern Territory 60.6 80376
9 New South Wales 57.8 4111200
The code arranges the votes in order of greatest to least percent of no votes using filter, select, and arrange.
# A tibble: 9 × 3
territory percent total
<chr> <dbl> <dbl>
1 New South Wales 42.2 4111200
2 Northern Territory 39.4 80376
3 Queensland 39.3 2448075
4 Australia 38.4 12691234
5 South Australia 37.5 948775
6 Tasmania 36.4 301603
7 Western Australia 36.3 1257499
8 Victoria 35.1 3306727
9 Australian Capital Territory 26.0 236979
Here I find the current and future data frame of australian marrige
territory response number
1 New South Wales yes 2374362
2 Victoria no 2145629
3 Queensland yes 1487060
4 Western Australia no 801575
territory response percent
1 Victoria yes 57.8
2 Queensland no 60.7
Here I categorize the data by using count() by territory
# A tibble: 9 × 2
territory n
<chr> <int>
1 Australia 2
2 Australian Capital Territory 2
3 New South Wales 2
4 Northern Territory 2
5 Queensland 2
6 South Australia 2
7 Tasmania 2
8 Victoria 2
9 Western Australia 2
Here I categorize the data by using count() by response
Here I pivot the dataset wider showing numbers of yes and no response.
# A tibble: 2 × 5
response `New South Wales` Victoria Queensland `Western Australia`
<chr> <dbl> <dbl> <dbl> <dbl>
1 yes 2374362 NA 1487060 NA
2 no NA 2145629 NA 801575
---
title: "Challenge 3 Instructions"
author: "Linda Humphrey"
description: "Tidy Data: Pivoting"
date: "03/06/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_3: Australian Marriage Law Postal Survey 2017 - Response
- my name: Linda Humphrey
- marriage_data: australian_marriage_law_postal_survey_2017_-_response_final.xls
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
library(xlsx)
library(readxl)
library(dplyr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to:
1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
2. identify what needs to be done to tidy the current data
3. anticipate the shape of pivoted data
4. pivot the data into tidy format using `pivot_longer`
## Read in data
Read in one (or more) of the following datasets, using the correct R package and command.
- animal_weights.csv ⭐
- eggs_tidy.csv ⭐⭐ or organiceggpoultry.xls ⭐⭐⭐
- australian_marriage\*.xls ⭐⭐⭐
- USA Households\*.xlsx ⭐⭐⭐⭐
- sce_labor_chart_data_public.xlsx 🌟🌟🌟🌟🌟
```{r}
# Reading in the data
library(readr)
marriage_data <- readxl::read_excel("~/Desktop/601_Spring_2023/posts/_data/australian_marriage_law_postal_survey_2017_-_response_final.xls", sheet = 2, range = "A7:G16")
marriage_data
```
### Briefly describe the data
I renamed all columns to be clear, created a response variable with 'yes' and 'no' values, and recalculated the percent variable and pivot the data longer.
```{r}
tidy_marriage_data <- marriage_data %>%
select(1, 2, 4, 6) %>%
rename("territory" = `...1`,
"yes" = no....2,
"no" = no....4,
"total" = no....6) %>%
pivot_longer(c(yes, no), names_to = "response", values_to = "number") %>%
mutate("percent" = number / total * 100)
tidy_marriage_data
```
## Anticipate the End Result
I chose to delete the percentage columns from the original data and recalculate them later to avoid messiness.
```{r}
tidy_marriage_data <- tidy_marriage_data %>%
relocate("total", .after = "number") %>%
mutate(territory = recode(territory, "Northern Territory(b)" = "Northern Territory",
"Australian Capital Territory(c)" = "Australian Capital Territory"))
tidy_marriage_data
```
The code arranges the votes in order of greatest to least percent of yes votes using filter, select, and arrange.
```{r}
yes_votes <- tidy_marriage_data %>%
filter(response == "yes") %>%
select(territory, percent, total) %>%
arrange(desc(percent))
yes_votes
```
The code arranges the votes in order of greatest to least percent of no votes using filter, select, and arrange.
```{r}
no_votes <- tidy_marriage_data %>%
filter(response == "no") %>%
select(territory, percent, total) %>%
arrange(desc(percent))
no_votes
```
Here I find the current and future data frame of australian marrige
```{r}
# create a data frame with some data
df <- data.frame(
territory = c("New South Wales", "Victoria", "Queensland", "Western Australia"),
response = c("yes", "no","yes", "no"),
number = c(2374362, 2145629, 1487060, 801575),
stringsAsFactors = FALSE
)
# print the current data frame
print(df)
# create a future data frame with some data
future_df <- data.frame(
territory = c("Victoria", "Queensland"),
response = c("yes", "no"),
percent = c(57.8, 60.7),
stringsAsFactors = FALSE
)
# print the future data frame
print(future_df)
```
### Challenge: Describe the final dimensions
Here I categorize the data by using count() by territory
```{r}
tidy_marriage_data %>% count(territory)
```
Here I categorize the data by using count() by response
```{r}
tidy_marriage_data %>% count(response)
```
### Challenge: Pivot the Chosen Data
Here I pivot the dataset wider showing numbers of yes and no response.
```{r}
library(tidyverse)
df_wide <- df %>% pivot_wider(
names_from = territory,
values_from = number
)
# view the pivoted data frame
df_wide
```