HW_2_AustralianMarriages

Wrangling Clean Data

Lenna Garibian
February 15, 2022

First, setup libraries and read in data followed by a preview of data.

colnames(marriage)
[1] "territory" "resp"      "count"     "percent"  
head(marriage)
# A tibble: 6 × 4
  territory       resp    count percent
  <chr>           <chr>   <dbl>   <dbl>
1 New South Wales yes   2374362    57.8
2 New South Wales no    1736838    42.2
3 Victoria        yes   2145629    64.9
4 Victoria        no    1161098    35.1
5 Queensland      yes   1487060    60.7
6 Queensland      no     961015    39.3

The data show percentages of people married and unmarried in different territories in Australia. We start by filtering out the “no” responses, and show the values of the percentages descending order - showing the territories with the highest proportion of marriages first.

filter(marriage, resp == "yes") %>%
  arrange(desc(percent)) 
# A tibble: 8 × 4
  territory                       resp    count percent
  <chr>                           <chr>   <dbl>   <dbl>
1 Australian Capital Territory(c) yes    175459    74  
2 Victoria                        yes   2145629    64.9
3 Western Australia               yes    801575    63.7
4 Tasmania                        yes    191948    63.6
5 South Australia                 yes    592528    62.5
6 Queensland                      yes   1487060    60.7
7 Northern Territory(b)           yes     48686    60.6
8 New South Wales                 yes   2374362    57.8

Then, we name a function to represent only the “yes” responses, and show their values

Percentage_Married <- filter(marriage,`resp` == "yes") %>%
select(territory, percent) 
print(Percentage_Married)
# A tibble: 8 × 2
  territory                       percent
  <chr>                             <dbl>
1 New South Wales                    57.8
2 Victoria                           64.9
3 Queensland                         60.7
4 South Australia                    62.5
5 Western Australia                  63.7
6 Tasmania                           63.6
7 Northern Territory(b)              60.6
8 Australian Capital Territory(c)    74  

I plot the values, but I’m sure there is a simpler (and more attractive) way …

ggplot(data = Percentage_Married) + 
  geom_bar(mapping = aes(x = territory, y = percent), stat = "identity")

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Garibian (2022, Feb. 16). Data Analytics and Computational Social Science: HW_2_AustralianMarriages. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlenna717866657/

BibTeX citation

@misc{garibian2022hw_2_australianmarriages,
  author = {Garibian, Lenna},
  title = {Data Analytics and Computational Social Science: HW_2_AustralianMarriages},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlenna717866657/},
  year = {2022}
}