Reading in & tidying data
Setting up R so that xlsx files/data can be read into the program and tidyverse packages are active.
Imported data from the xlsx file by navigating the “Files” tab under the plots pane. While you can include all data as is, I excluded the top 8 rows to tidy the data a bit. This can also be done using code as shown in the chunk below. Also, note that in the same chunk, I’ve renamed the original file name from “ActiveDuty_MaritalStatus” to “Marital_Status”. This data set only takes into account the 1st sheet within the excel file that was used for reference.
ActiveDuty_MaritalStatus <- read_excel("HW2/ActiveDuty_MaritalStatus.xlsx", skip = 8)
Marital_Status <- ActiveDuty_MaritalStatus
#Confirming that the data set was renamed correctly
Marital_Status
# A tibble: 30 x 17
...1 `Pay Grade` Male...3 Female...4 Total...5 Male...6 Female...7
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 <NA> E-1 31229 5717 36946 563 122
2 <NA> E-2 53094 8388 61482 1457 275
3 <NA> E-3 131091 21019 152110 4264 1920
4 <NA> E-4 112710 16381 129091 9491 4662
5 <NA> E-5 57989 11021 69010 10937 6576
6 <NA> E-6 19125 4654 23779 10369 4962
7 <NA> E-7 5446 1913 7359 6530 2585
8 <NA> E-8 1009 438 1447 1786 513
9 <NA> E-9 381 202 583 579 144
10 <NA> TOTAL ENLI~ 412074 69733 481807 45976 21759
# ... with 20 more rows, and 10 more variables: Total...8 <dbl>,
# Male...9 <dbl>, Female...10 <dbl>, Total...11 <dbl>,
# Male...12 <dbl>, Female...13 <dbl>, Total...14 <dbl>,
# Male...15 <dbl>, Female...16 <dbl>, Total...17 <dbl>
The data itself categorizes all active duty personnel according to their pay grade and current marital status. Marital status is defined as either single without children, single with children, joint service marriage, or civilian marriage with an additional category representing the summed total between all of them in that specific pay grade.
There are also 3 sub-categories included in each of the previously mentioned categories (except the grand totals column) further breaking down the data according to gender (Male & Female) and the 3rd being the total between both.
I’ve included code below that renames all the respective columns to something a bit more legible with those ending in; WO representing active duty personnel that are single without children, W representing those that are single with children, J representing those in a joint service marriage, C representing those a part of a civilian marriage and the final column representing the grand total.
Marital_Status <- rename(Marital_Status, Male_WO = Male...3, Fem_WO = Female...4, Total_WO = Total...5, Male_W = Male...6, Fem_W = Female...7, Total_W = Total...8, Male_J = Male...9, Fem_J = Female...10, Total_J = Total...11, Male_C = Male...12, Fem_C = Female...13, Total_C = Total...14, Male_Total = Male...15, Fem_Total = Female...16, Grand_Total = Total...17)
#Checking that all categories were renamed correctly
head(Marital_Status) %>%
select("Pay Grade": "Grand_Total")
# A tibble: 6 x 16
`Pay Grade` Male_WO Fem_WO Total_WO Male_W Fem_W Total_W Male_J
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 E-1 31229 5717 36946 563 122 685 139
2 E-2 53094 8388 61482 1457 275 1732 438
3 E-3 131091 21019 152110 4264 1920 6184 3579
4 E-4 112710 16381 129091 9491 4662 14153 8661
5 E-5 57989 11021 69010 10937 6576 17513 12459
6 E-6 19125 4654 23779 10369 4962 15331 8474
# ... with 8 more variables: Fem_J <dbl>, Total_J <dbl>,
# Male_C <dbl>, Fem_C <dbl>, Total_C <dbl>, Male_Total <dbl>,
# Fem_Total <dbl>, Grand_Total <dbl>
All data included in this specific data set would be considered numeric.
The data present within the imported set can be interpreted in many ways, but for the sake of this assignment, I’ve chosen to sort them in 2 different ways according to the male gender.
To start I used the code present in the chunk below to sort all columns except for the male totals and the pay grade they belong to.
# A tibble: 30 x 2
`Pay Grade` Male_Total
<chr> <dbl>
1 E-1 36991
2 E-2 67472
3 E-3 193729
4 E-4 236418
5 E-5 212329
6 E-6 148290
7 E-7 87042
8 E-8 25297
9 E-9 9633
10 TOTAL ENLISTED 1017201
# ... with 20 more rows
This clearly shows us the total active duty males categorized by their pay grade. Nothing fancy, the data is sorted according to the order in which the pay grades are originally listed in the respective excel document.
To take it a step further, I’ve included the code below that further sorts the previous data but in descending order. This code omits the “GRAND TOTAL”, “TOTAL ENLISTED”, and “TOTAL OFFICER” rows in order to tidy the data so that only each specific pay grade is shown.
Marital_Status %>%
select("Pay Grade", "Male_Total") %>%
filter(`Pay Grade` != "GRAND TOTAL" & `Pay Grade` != "TOTAL ENLISTED" & `Pay Grade` != "TOTAL OFFICER") %>%
ungroup() %>%
arrange(desc(Male_Total))
# A tibble: 25 x 2
`Pay Grade` Male_Total
<chr> <dbl>
1 E-4 236418
2 E-5 212329
3 E-3 193729
4 E-6 148290
5 E-7 87042
6 E-2 67472
7 O-3 57973
8 O-4 38492
9 E-1 36991
10 O-5 25341
# ... with 15 more rows
The filter() command removes the previously mentioned rows, the ungroup() function breaks up the format of the columns so that they may be sorted in descending order and lastly, the arrange function actually sorts the data in the aforementioned descending order. Here we can clearly see that a majority of enlisted males, regardless of marital status, belong to the E-4 pay grade.
I hope that this submission meets all the required criteria, thank you for reading!
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gamez (2022, Feb. 9). Data Analytics and Computational Social Science: HW 2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654863847/
BibTeX citation
@misc{gamez2022hw, author = {Gamez, Alexis}, title = {Data Analytics and Computational Social Science: HW 2}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654863847/}, year = {2022} }