Data Analytics and Computational Social Science: HW02

Angela Smith

Introduction

For HW02, I am reading in data from the tidy dataset australian_marriage_tidy.xlsx.

Read-In Dataset australian_marriage_tidy.xlsx

The Australian Marriage dataset includes four columns: territory (string), resp (string), count (numeric), and percent (numeric). “territory” reflects geographical regions of Australia. “resp” reflects yes or no observations related to marriage. “count” includes the number of yes or no observations in a given region. “percent” includes count as a percent of yes or no observations in a given region.

knitr::opts_chunk$set(echo = TRUE)

library(readxl)

australian_marriages <- read_excel(path="/Users/angelasmith/Desktop/DACSS601/HW02/australian_marriage_tidy.xlsx", range="A1:D17")

australian_marriages

# A tibble: 16 × 4
   territory                       resp    count percent
   <chr>                           <chr>   <dbl>   <dbl>
 1 New South Wales                 yes   2374362    57.8
 2 New South Wales                 no    1736838    42.2
 3 Victoria                        yes   2145629    64.9
 4 Victoria                        no    1161098    35.1
 5 Queensland                      yes   1487060    60.7
 6 Queensland                      no     961015    39.3
 7 South Australia                 yes    592528    62.5
 8 South Australia                 no     356247    37.5
 9 Western Australia               yes    801575    63.7
10 Western Australia               no     455924    36.3
11 Tasmania                        yes    191948    63.6
12 Tasmania                        no     109655    36.4
13 Northern Territory(b)           yes     48686    60.6
14 Northern Territory(b)           no      31690    39.4
15 Australian Capital Territory(c) yes    175459    74  
16 Australian Capital Territory(c) no      61520    26

Wrangling data using filter() and arrange()

Percent of Australians married by territory

To explore the data further, I am filtering the dataset by marital “yes” observations and arranging in descending order.

knitr::opts_chunk$set(echo = TRUE)

library(dplyr)
data(australian_marriages)
australian_marriages %>%
  filter(resp == 'yes') %>%
  arrange(desc(percent))

# A tibble: 8 × 4
  territory                       resp    count percent
  <chr>                           <chr>   <dbl>   <dbl>
1 Australian Capital Territory(c) yes    175459    74  
2 Victoria                        yes   2145629    64.9
3 Western Australia               yes    801575    63.7
4 Tasmania                        yes    191948    63.6
5 South Australia                 yes    592528    62.5
6 Queensland                      yes   1487060    60.7
7 Northern Territory(b)           yes     48686    60.6
8 New South Wales                 yes   2374362    57.8

Marriage status of the majority of Australians by geographic region

We can determine the dominant marriage status (married or unmarried) for the majority of Australians by each territory by filtering the percent column to >=50%.

knitr::opts_chunk$set(echo = TRUE)

library(dplyr)
data(australian_marriages)
australian_marriages %>%
  filter(percent >= 50) %>%
  arrange(desc(percent))

# A tibble: 8 × 4
  territory                       resp    count percent
  <chr>                           <chr>   <dbl>   <dbl>
1 Australian Capital Territory(c) yes    175459    74  
2 Victoria                        yes   2145629    64.9
3 Western Australia               yes    801575    63.7
4 Tasmania                        yes    191948    63.6
5 South Australia                 yes    592528    62.5
6 Queensland                      yes   1487060    60.7
7 Northern Territory(b)           yes     48686    60.6
8 New South Wales                 yes   2374362    57.8

Coincidentally, the tables from both exercises are the same. The majority of Australians from each territory are married.

Comment on this article Share:

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Smith (2021, Dec. 30). Data Analytics and Computational Social Science: HW02. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith851830/

BibTeX citation

@misc{smith2021hw02,
  author = {Smith, Angela},
  title = {Data Analytics and Computational Social Science: HW02},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomangelanicolesmith851830/},
  year = {2021}
}