Challenge 3 - Australian Marriage Survey

challenge_3

Justine Shakespeare

australian_marriage

Tidy Data: Pivoting

Author

Justine Shakespeare

Published

March 8, 2023

Code

library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in data

First we’ll read in the data. We’ll be focusing on the not-yet-tidy dataset on the Australian Marriage Law Survey. After inspecting the data I can see that this dataset includes some additional rows at the top of the sheet and the column names are in multiple rows. For ease of analysis, I’ll skip the first few lines and rename the rows when reading in this data.

Code

aussieMarriage <- read_excel("_data/australian_marriage_law_postal_survey_2017_-_response_final.xls", sheet = 2, skip = 7, col_names = c("State_territory", "Yes", "Yes %", "No", "No %", "Total", "Total %", "delete", "Response clear", "Response clear %", "Response not clear", "Response not clear %", "Non-responding", "Non-responding %", "Resp Total", "Resp Total %"))

aussieMarriage

# A tibble: 15 × 16
   State…¹     Yes `Yes %`      No `No %`   Total Total…² delete Respo…³ Respo…⁴
   <chr>     <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl> <lgl>    <dbl>   <dbl>
 1 New So… 2374362    57.8 1736838   42.2  4.11e6     100 NA      4.11e6    79.2
 2 Victor… 2145629    64.9 1161098   35.1  3.31e6     100 NA      3.31e6    81.4
 3 Queens… 1487060    60.7  961015   39.3  2.45e6     100 NA      2.45e6    77.7
 4 South …  592528    62.5  356247   37.5  9.49e5     100 NA      9.49e5    79.5
 5 Wester…  801575    63.7  455924   36.3  1.26e6     100 NA      1.26e6    78.3
 6 Tasman…  191948    63.6  109655   36.4  3.02e5     100 NA      3.02e5    79.5
 7 Northe…   48686    60.6   31690   39.4  8.04e4     100 NA      8.04e4    58.2
 8 Austra…  175459    74     61520   26    2.37e5     100 NA      2.37e5    82.3
 9 Austra… 7817247    61.6 4873987   38.4  1.27e7     100 NA      1.27e7    79.3
10 <NA>         NA    NA        NA   NA   NA           NA NA     NA         NA  
11 (a) In…      NA    NA        NA   NA   NA           NA NA     NA         NA  
12 (b) In…      NA    NA        NA   NA   NA           NA NA     NA         NA  
13 (c) In…      NA    NA        NA   NA   NA           NA NA     NA         NA  
14 <NA>         NA    NA        NA   NA   NA           NA NA     NA         NA  
15 © Comm…      NA    NA        NA   NA   NA           NA NA     NA         NA  
# … with 6 more variables: `Response not clear` <dbl>,
#   `Response not clear %` <dbl>, `Non-responding` <dbl>,
#   `Non-responding %` <dbl>, `Resp Total` <dbl>, `Resp Total %` <dbl>, and
#   abbreviated variable names ¹State_territory, ²`Total %`, ³`Response clear`,
#   ⁴`Response clear %`

Data Description

This data shows the results of the Australian Marriage Law Postal Survey from 2017, which was administered to eligible participants throughout Australia. Surprisingly, I could not find the question on the survey in the actual dataset. A quick google search revealed that the question on the survey was, “Should the law be changed to allow same-sex couples to marry?”

This data shows the responses of eligible participants organized by region (state or territory).

Data Cleaning

Before we attempt any pivoting, we should clean up this dataset a bit more. Assuming we’re interested in the results of the survey and not so much in how many responses were clear, let’s select just the relevant columns. We can also leave off the percentage columns, since we can calculate those ourselves with the raw counts. We’ll also drop the last few lines since they are endnotes.

Code

aussieMarriage1 <- select(aussieMarriage, "State_territory", "Yes", "No", "Total")
aussieMarriage2 <- slice(aussieMarriage1, 1:(nrow(aussieMarriage) - 6))
aussieMarriage2

# A tibble: 9 × 4
  State_territory                     Yes      No    Total
  <chr>                             <dbl>   <dbl>    <dbl>
1 New South Wales                 2374362 1736838  4111200
2 Victoria                        2145629 1161098  3306727
3 Queensland                      1487060  961015  2448075
4 South Australia                  592528  356247   948775
5 Western Australia                801575  455924  1257499
6 Tasmania                         191948  109655   301603
7 Northern Territory(b)             48686   31690    80376
8 Australian Capital Territory(c)  175459   61520   236979
9 Australia                       7817247 4873987 12691234

Data Review and Pivot

Now that we have a clean dataset we can take a closer look. Using the dim() and head() functions we can see that we have 9 observations and 4 variables. The 4 variables include the state or territory variable which indicates the region within Australia a vote is from, including Australia at the end. Then there is a Yes column with the number of “yes” votes, a No column with the number of “no” votes, and finally a Total column with the total number of votes.

Code

dim(aussieMarriage2)

[1] 9 4

Code

head(aussieMarriage2, 10)

# A tibble: 9 × 4
  State_territory                     Yes      No    Total
  <chr>                             <dbl>   <dbl>    <dbl>
1 New South Wales                 2374362 1736838  4111200
2 Victoria                        2145629 1161098  3306727
3 Queensland                      1487060  961015  2448075
4 South Australia                  592528  356247   948775
5 Western Australia                801575  455924  1257499
6 Tasmania                         191948  109655   301603
7 Northern Territory(b)             48686   31690    80376
8 Australian Capital Territory(c)  175459   61520   236979
9 Australia                       7817247 4873987 12691234

Let’s use the pivot_longer() function to move all of the yes and no votes to two columns called “Response” and “Count”. We can also calculate the percentage using these counts and the total column.

Code

aussieMarriagePL <- aussieMarriage2 %>% 
  pivot_longer(col = c("Yes", "No"),
               names_to = "Response",
               values_to = "Count") %>% 
  mutate("Percent" = Count/Total)

aussieMarriageFinal <- select(aussieMarriagePL, State_territory, Response, Count, Percent, Total) # reordering the columns so that Total shows up on the far right.

head(aussieMarriageFinal, 20)

# A tibble: 18 × 5
   State_territory                 Response   Count Percent    Total
   <chr>                           <chr>      <dbl>   <dbl>    <dbl>
 1 New South Wales                 Yes      2374362   0.578  4111200
 2 New South Wales                 No       1736838   0.422  4111200
 3 Victoria                        Yes      2145629   0.649  3306727
 4 Victoria                        No       1161098   0.351  3306727
 5 Queensland                      Yes      1487060   0.607  2448075
 6 Queensland                      No        961015   0.393  2448075
 7 South Australia                 Yes       592528   0.625   948775
 8 South Australia                 No        356247   0.375   948775
 9 Western Australia               Yes       801575   0.637  1257499
10 Western Australia               No        455924   0.363  1257499
11 Tasmania                        Yes       191948   0.636   301603
12 Tasmania                        No        109655   0.364   301603
13 Northern Territory(b)           Yes        48686   0.606    80376
14 Northern Territory(b)           No         31690   0.394    80376
15 Australian Capital Territory(c) Yes       175459   0.740   236979
16 Australian Capital Territory(c) No         61520   0.260   236979
17 Australia                       Yes      7817247   0.616 12691234
18 Australia                       No       4873987   0.384 12691234

This new dataset shows the same information but organized differently. Now each observation shows the number of Yes or No votes in a state/territory, along with the Percent those votes represent out of the total in that region.