Challenge 3 - Australian Marriage

challenge_3
australian_marriage
srujan_kagitala
Tidy Data: Pivoting
Author

Srujan Kagitala

Published

June 26, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to:

  1. read in a data set, and describe the data set using both words and any supporting information (e.g., tables, etc)
  2. identify what needs to be done to tidy the current data
  3. anticipate the shape of pivoted data
  4. pivot the data into tidy format using pivot_longer

Read in data

Read in one (or more) of the following datasets, using the correct R package and command.

  • australian_marriage*.xls ⭐⭐⭐
Code
#read data
marriage_data <- read.csv("_data/australian_marriage_tidy.csv")

#print head of read data.
head(marriage_data)
        territory resp   count percent
1 New South Wales  yes 2374362    57.8
2 New South Wales   no 1736838    42.2
3        Victoria  yes 2145629    64.9
4        Victoria   no 1161098    35.1
5      Queensland  yes 1487060    60.7
6      Queensland   no  961015    39.3

Briefly describe the data and Anticipate the End Result

Describe the data, and be sure to comment on why you are planning to pivot it to make it “tidy”

The data set I choose is Australian marriage. It looks like it captures the responses of participants to a survey across regions in Australia. The data set has 16 rows and 4 columns. The variables recorded for each observation are territory, resp, count, percent. The data currently records two observations for each region in Australia. So, I want to tidy the data to represent all the responses and their statistics specific to an area into a single observation. For this, we need to convert the values “resp” variable can take to independent variables and assign values from count and percent variables. The tidied data set will contain territory, yes/no count, yes/no percent for each observation.

Challenge: Describe the final dimensions

Code
# Unique values of resp variable.
no_unqiue_resp_values = length(unique(marriage_data$resp))

#expected rows
rows = nrow(marriage_data)/no_unqiue_resp_values

#expected columns
columns = (ncol(marriage_data) - 2)* no_unqiue_resp_values + 1

The number of unique values the resp variable has are 2. After pivoting wider, the data set will have 8 rows and 5 columns. So, the pivoted data set will have 8x5 dimension.

Challenge: Pivot the Chosen Data

Code
#pivot wider on resp variable with values from count and percent
marriage_pivot <- marriage_data %>%
  pivot_wider(names_from=resp, values_from=c(count, percent))
marriage_pivot
# A tibble: 8 × 5
  territory                       count_yes count_no percent_yes percent_no
  <chr>                               <int>    <int>       <dbl>      <dbl>
1 New South Wales                   2374362  1736838        57.8       42.2
2 Victoria                          2145629  1161098        64.9       35.1
3 Queensland                        1487060   961015        60.7       39.3
4 South Australia                    592528   356247        62.5       37.5
5 Western Australia                  801575   455924        63.7       36.3
6 Tasmania                           191948   109655        63.6       36.4
7 Northern Territory(b)               48686    31690        60.6       39.4
8 Australian Capital Territory(c)    175459    61520        74         26  

The dimensions of dataset after pivoting longer match with the calculated dimension.