Wrangling Clean Data
First, setup libraries and read in data followed by a preview of data.
colnames(marriage)
[1] "territory" "resp" "count" "percent"
head(marriage)
# A tibble: 6 × 4
territory resp count percent
<chr> <chr> <dbl> <dbl>
1 New South Wales yes 2374362 57.8
2 New South Wales no 1736838 42.2
3 Victoria yes 2145629 64.9
4 Victoria no 1161098 35.1
5 Queensland yes 1487060 60.7
6 Queensland no 961015 39.3
The data show percentages of people married and unmarried in different territories in Australia. We start by filtering out the “no” responses, and show the values of the percentages descending order - showing the territories with the highest proportion of marriages first.
# A tibble: 8 × 4
territory resp count percent
<chr> <chr> <dbl> <dbl>
1 Australian Capital Territory(c) yes 175459 74
2 Victoria yes 2145629 64.9
3 Western Australia yes 801575 63.7
4 Tasmania yes 191948 63.6
5 South Australia yes 592528 62.5
6 Queensland yes 1487060 60.7
7 Northern Territory(b) yes 48686 60.6
8 New South Wales yes 2374362 57.8
Then, we name a function to represent only the “yes” responses, and show their values
Percentage_Married <- filter(marriage,`resp` == "yes") %>%
select(territory, percent)
print(Percentage_Married)
# A tibble: 8 × 2
territory percent
<chr> <dbl>
1 New South Wales 57.8
2 Victoria 64.9
3 Queensland 60.7
4 South Australia 62.5
5 Western Australia 63.7
6 Tasmania 63.6
7 Northern Territory(b) 60.6
8 Australian Capital Territory(c) 74
I plot the values, but I’m sure there is a simpler (and more attractive) way …
ggplot(data = Percentage_Married) +
geom_bar(mapping = aes(x = territory, y = percent), stat = "identity")
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Garibian (2022, Feb. 16). Data Analytics and Computational Social Science: HW_2_AustralianMarriages. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlenna717866657/
BibTeX citation
@misc{garibian2022hw_2_australianmarriages, author = {Garibian, Lenna}, title = {Data Analytics and Computational Social Science: HW_2_AustralianMarriages}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlenna717866657/}, year = {2022} }