This is a brief exploratory analysis regarding the results of a survey of Australians’ opinions of changing the Australian marriage law to allow same sex marriage.
The following page will describe my personal journey of very briefly analyzing a survey conducted by the Australian Bureau of Statistics in the fall of 2017. This survey was done through the postal service, was completely voluntary, and was sent to all registered voters in Australia to gauge the public’s opinion of changing the law to allow same sex couples to marry. The intent of this page is to clearly outline my thought process and coding steps taken to practice reading, wrangling, and operating on a not-so-tidy data set.
First, we will read in the data from the results of the survey.
am_survey <- read_xls("australian_marriage_law_postal_survey_2017_-_response_final.xls")
am_survey
# A tibble: 23 x 3
`Australian Bureau of Statistics` ...2 ...3
<chr> <chr> <chr>
1 1800.0 Australian Marriage Law Postal Survey, 2017 <NA> <NA>
2 Released on 15 November 2017 <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> Contents <NA>
5 <NA> Tables <NA>
6 <NA> Table 1 Resp~
7 <NA> Table 2 Resp~
8 <NA> <NA> <NA>
9 <NA> Explanato~ <NA>
10 <NA> <NA> <NA>
# ... with 13 more rows
After reading and viewing the imported data we notice that the table that was read looks to be a title page for additional pages. There are three referenced links on this page including references to “Table 1” and “Table 2”, which are likely of interest. Now we will read in each of these sheets and review them as well.
am_survey_tbl1 <- read_xls("australian_marriage_law_postal_survey_2017_-_response_final.xls", sheet = "Table 1")
am_survey_tbl2 <- read_xls("australian_marriage_law_postal_survey_2017_-_response_final.xls", sheet = "Table 2")
am_survey_tbl1
# A tibble: 21 x 16
`Australian Burea~` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr>
1 1800.0 Australian ~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
2 Released on 15 Nov~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
3 Table 1 Response b~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
4 <NA> Resp~ <NA> <NA> <NA> <NA> <NA> NA Elig~
5 <NA> Yes <NA> No <NA> Total <NA> NA Resp~
6 <NA> no. % no. % no. % NA no.
7 New South Wales 2374~ 57.7~ 1736~ 42.2~ 4111~ 100 NA 4111~
8 Victoria 2145~ 64.9~ 1161~ 35.1~ 3306~ 100 NA 3306~
9 Queensland 1487~ 60.7~ 9610~ 39.2~ 2448~ 100 NA 2448~
10 South Australia 5925~ 62.5 3562~ 37.5 9487~ 100 NA 9487~
# ... with 11 more rows, and 7 more variables: ...10 <chr>,
# ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>, ...15 <chr>,
# ...16 <chr>
am_survey_tbl2
# A tibble: 190 x 16
`Australian Burea~` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <chr>
1 1800.0 Australian ~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
2 Released on 15 Nov~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
3 Table 2 Response b~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
4 <NA> Resp~ <NA> <NA> <NA> <NA> <NA> NA Elig~
5 <NA> Yes <NA> No <NA> Total <NA> NA Resp~
6 <NA> no. % no. % no. % NA no.
7 New South Wales Di~ <NA> <NA> <NA> <NA> <NA> <NA> NA <NA>
8 Banks 37736 44.8~ 46343 55.1~ 84079 100 NA 84079
9 Barton 37153 43.6~ 47984 56.3~ 85137 100 NA 85137
10 Bennelong 42943 49.7~ 43215 50.2~ 86158 100 NA 86158
# ... with 180 more rows, and 7 more variables: ...10 <chr>,
# ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>, ...15 <chr>,
# ...16 <chr>
Reviewing the table sheets and their associated data we now see that Table 1 is an aggregate of data from Table 2 by State/Territory. From here on we can focus on Table 2 as it has all of the underlying data that we will be interested in.
Focusing on Table 2, we see that several of the first rows are used for description, several more rows are used to describe groups of variables, several columns are duplicates or aggregates, and it includes title and sub-total lines for each State/Territory.
To deal with this we will 1) re-read the data and skip the descriptive rows, 2) select only the columns that we will need, 3) add a State/Territory variable to identify the Divisions, and 4) remove the title and sub-total lines. We will also take this opportunity to assign appropriate names to our variables.
# Read in and skip first 7 rows with miscellaneous information
am_survey_final <- read_xls("australian_marriage_law_postal_survey_2017_-_response_final.xls",
sheet = "Table 2", skip=7)
# Subset to only the columns we're interested in. Will be keeping: Yes answers, No answers,
# Eligible participants with clear responses, without clear responses, and non-responders.
am_survey_final <- am_survey_final[,c(1,2,4,6,11,13)]
# Add a State_Territory variable
am_survey_final$State_Territory <- NA
# Assign appropriate values to the State_Territory variable
am_survey_final[1:47,]$State_Territory <- "New South Wales"
am_survey_final[51:87,]$State_Territory <- "Victoria"
am_survey_final[91:120,]$State_Territory <- "Queensland"
am_survey_final[124:134,]$State_Territory <- "South Australia"
am_survey_final[138:153,]$State_Territory <- "Western Australia"
am_survey_final[157:161,]$State_Territory <- "Tasmania"
am_survey_final[165:166,]$State_Territory <- "Northern Territory"
am_survey_final[170:171,]$State_Territory <- "Australian Capital"
# Remove all total and title lines
am_survey_final <- am_survey_final %>% filter(!is.na(State_Territory))
# Add clear column names
colnames(am_survey_final) <- c("Division","Yes","No","Clear Responses","Not Clear Responses","Non-Responders","State_Territory")
Alright, let’s check it out.
str(am_survey_final)
tibble [150 x 7] (S3: tbl_df/tbl/data.frame)
$ Division : chr [1:150] "Banks" "Barton" "Bennelong" "Berowra" ...
$ Yes : num [1:150] 37736 37153 42943 48471 20406 ...
$ No : num [1:150] 46343 47984 43215 40369 57926 ...
$ Clear Responses : num [1:150] 84079 85137 86158 88840 78332 ...
$ Not Clear Responses: num [1:150] 247 226 244 212 220 202 285 263 229 315 ...
$ Non-Responders : num [1:150] 20928 24008 19973 16038 25883 ...
$ State_Territory : chr [1:150] "New South Wales" "New South Wales" "New South Wales" "New South Wales" ...
rmarkdown::paged_table(am_survey_final)
Yay! We have a decent data set to work with!
Below is a list of variable definitions to better understand what variables we have decided to take.
Now we will perform some operations on the data to explore it.
am_survey_NSW <- am_survey_final %>% filter(State_Territory == "New South Wales") %>%
arrange(-Yes)
rmarkdown::paged_table(am_survey_NSW)
And, for fun, let’s plot some of the results. I am interested to see how the population responded in terms of proportion of YES and NO responses and how that distribution may be impacted by the State/Territory of the populations.
So what we will do is calculate the proportions of responses, plot the distribution, and highlight the results by State/Territory to see what, if any, patterns emerge.
am_survey_ST_grouped <- am_survey_final %>% group_by(State_Territory,Division) %>%
summarise(Total_Responses = sum(`Clear Responses`,`Not Clear Responses`),
Married_Perc = sum(Yes)/sum((Yes+No))) %>%
arrange(-Total_Responses)
rmarkdown::paged_table(am_survey_ST_grouped)
am_survey_ST_grouped %>% ggplot(aes(x=Married_Perc)) +
geom_histogram(color="black",aes(fill=State_Territory),binwidth = 0.025) +
labs(title = "Survey of Australians About Marriage Law Change",
subtitle = "Should the law be changed to allow same sex couples to marry?",
x="Percent of Population Who Responded YES",
y="Count of Divisions")
Interesting…No conclusions today and a lot of unanswered questions, but a good start to understanding the opinions of eligible Australian voters on the topic of the legality of same-sex marriage.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Bartelloni (2022, Feb. 9). Data Analytics and Computational Social Science: Australian Survey About Marriage Law Change. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtbartelloni862736/
BibTeX citation
@misc{bartelloni2022australian, author = {Bartelloni, Tory}, title = {Data Analytics and Computational Social Science: Australian Survey About Marriage Law Change}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomtbartelloni862736/}, year = {2022} }