A short description of the post.
Importing data using a specific range to isolate desired variables, renaming variables to add meaningful values and allow for easier selecting, removing interstitial undesirable variables
base_marriage_data <- read_excel("../../_data/australian_marriage_law_postal_survey_2017_-_response_final.xls",
sheet = "Table 2",
range = "A8:P179",
col_names = c("Town", "Yes", "d", "No", rep("d", 6), "Not Clear", "d",
"No Response", rep("d", 3)))%>%
select(!starts_with("d"))
The next variable we need to isolate is County which, in this data set, has a parent-child relationship with Town. To accomplish this, we will create a new column for County values that correlate with their child towns and order the columns in descending complexity (in this case, county then town)
base_marriage_data <- base_marriage_data%>%
mutate(County = case_when(
str_ends(Town, "Divisions") ~ Town,
TRUE ~ NA_character_))%>%
# Because I cannot get the .before or .after arguments to work with mutate(), I am using the relocate() function to move the County column before the Town column so we can maintain a descending order of complexity in governmental organizations
relocate(County, .before = Town)
To complete the isolation of County data, we need to populate our new column with the appropriate parent-county for their associated child-towns. We use a loop function to pull the County value down our column, stopping when a new county is reached and then restarting itself with the new county value.
This next chunk removes undesirable rows so we can isolate our observations. Since we were able to import our data with a range that cut out unnecessary rows above and below our data frame, now we need to account for interstitial rows without data and rows with totals
tidier_marriage_data <- tidy_marriage_data%>%drop_na(Town, Yes)%>%
filter(!str_detect(Town, "(Total)"))
As an extra step to tidy the data, we can remove “Divisions” in the County column as this variable is now describing the county itself, not the child-towns.
tidiest_marriage_data <- mutate(tidier_marriage_data, County = str_remove(County, " Divisions"))%>%
mutate(Town = str_remove(Town, "\\([cde]\\)"))
view(tidiest_marriage_data)
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Nestor (2021, Aug. 17). DACSS 601 August 2021: DN Australian Data. Retrieved from https://mrolfe.github.io/DACSS601August2021/posts/2021-08-17-dn-australian-data/
BibTeX citation
@misc{nestor2021dn, author = {Nestor, Dana}, title = {DACSS 601 August 2021: DN Australian Data}, url = {https://mrolfe.github.io/DACSS601August2021/posts/2021-08-17-dn-australian-data/}, year = {2021} }