“DACSS 601 - HW3: Data Wrangling for DOD Active Duty Marital Status”
For this assignment I used the Department of Defense Active Duty Marital Status. This includes marital and child status demographics by gender and pay grade for various military branches as of April 2010.
The dimensions of the data set include 912 observations across 5 variables. There are three (3) character variables titled “enlisted”, “branch”, and “status” as well as two (2) numeric variables for “pay_grade” and “count”.
A. Enlisted - there are three (3) subcategories for “enlisted” status including “E” for Enlisted, “O” for Officer, and “W” for Warrant.
B. Pay_Grade - within each “enlisted” category are distinguished levels of pay_grade. Enlisted members have a pay grade of 1 - 9, while Officers’ pay grade is determined on a scale of 1 - 10. Finally, Warrant members’ pay grade is on a scale of 1 - 5.
C. Branch - this variable consists of five (5) subcategories of Department of Defense members, “TotalDoD,” “AirForce,” “MarineCorps,” “Navy,” and “Army.”
D. Status - there are eight (8) distinct “status” variables representing demographic information across each branch describing the service members’ marital, gender, and child status. These 8 subcategories include:
* married civilian female,
* married civilian male,
* married joint service female,
* married joint service male,
* single with children female,
* single with children male,
* single without children female, and
* single without children male.
Note, it is not obvious at this time if the child status of married service members is captured in the data.
E. Count - this is a numeric value of the number of occurrences per distinct observation.
An observation is the unique combination of enlisted status, pay grade, marital/child status, and branch.
#View composition of Marital Status data
print(Marital_status)
# A tibble: 912 x 5
pay_grade enlisted branch status count
<dbl> <chr> <chr> <chr> <dbl>
1 1 E TotalDoD single withoutchildren male 31229
2 1 E TotalDoD single withoutchildren female 5717
3 1 E TotalDoD single withchildren male 563
4 1 E TotalDoD single withchildren female 122
5 1 E TotalDoD married jointservice male 139
6 1 E TotalDoD married jointservice female 141
7 1 E TotalDoD married civilian female 5060
8 1 E TotalDoD married civilian male 719
9 2 E TotalDoD single withoutchildren male 53094
10 2 E TotalDoD single withoutchildren female 8388
# ... with 902 more rows
tail(Marital_status)
# A tibble: 6 x 5
pay_grade enlisted branch status count
<dbl> <chr> <chr> <chr> <dbl>
1 5 W Army single withchildren male 20
2 5 W Army single withchildren female 1
3 5 W Army married jointservice male 11
4 5 W Army married jointservice female 4
5 5 W Army married civilian female 504
6 5 W Army married civilian male 10
To start I wanted to understand the most common occurrences in the data, I began by viewing the highest and lowest status representation within the Total DoD by pay grade. The following table shows the top 10 and bottom 10 observations by count.
#Top 10 most frequent status observations within Total DoD
filter(Marital_status, branch == "TotalDoD") %>%
arrange(desc(count))
# A tibble: 192 x 5
pay_grade enlisted branch status count
<dbl> <chr> <chr> <chr> <dbl>
1 3 E TotalDoD single withoutchildren male 131091
2 5 E TotalDoD married civilian female 130944
3 4 E TotalDoD single withoutchildren male 112710
4 6 E TotalDoD married civilian female 110322
5 4 E TotalDoD married civilian female 105556
6 7 E TotalDoD married civilian female 70001
7 5 E TotalDoD single withoutchildren male 57989
8 3 E TotalDoD married civilian female 54795
9 2 E TotalDoD single withoutchildren male 53094
10 3 O TotalDoD married civilian female 38963
# ... with 182 more rows
#Bottom 10 least frequent status observations within Total DoD
filter(Marital_status, branch == "TotalDoD") %>%
arrange(count)
# A tibble: 192 x 5
pay_grade enlisted branch status count
<dbl> <chr> <chr> <chr> <dbl>
1 8 O TotalDoD single withchildren male 0
2 8 O TotalDoD single withchildren female 0
3 9 O TotalDoD single withchildren female 0
4 10 O TotalDoD single withoutchildren female 0
5 10 O TotalDoD single withchildren male 0
6 10 O TotalDoD single withchildren female 0
7 10 O TotalDoD married civilian male 0
8 7 O TotalDoD single withchildren female 1
9 9 O TotalDoD single withoutchildren male 1
10 9 O TotalDoD single withoutchildren female 1
# ... with 182 more rows
Most common occurrences - Here we can see the highest marital status representation by Total DoD members are enlisted, single males without children in pay grade 3 with 131,091 occurrences. A close second, by less than 100 occurrences, are enlisted, married civilian females within pay grade 5 totaling 130,994. We can begin to see a trend of common occurrences amongst enlisted single males without children and enlisted civilian females with a median pay grade of 4. Although based on this observation, these two groups are the most represented groups within the data set irrespective of pay grade.
#create Top 10 Total DoD vector by occurrence
#calculate median pay grade
TOP_10_DoD = c(3, 5, 4, 6, 4, 7, 5, 3, 2, 3)
median(TOP_10_DoD)
[1] 4
#create Bottom 10 Total DoD vector by occurrence
#calculate median pay grade
Bottom_10_DoD = c(8, 8, 9, 10, 10, 10, 10, 7, 9, 9)
median(Bottom_10_DoD)
[1] 9
Least common occurrences - Alternatively, the least representation is amongst single Officers primarily females but also males with a median pay grade of 9. From this we can begin to form some initial research questions.
Before I finalize my research questions, I want to conduct more exploratory analysis of the data. So I would first like to answer:
1. What is the relational composition of the Department of Defense by gender and marital status?
2. Are there any significant variances in representation within pay grades?
3. Are there any significant variances in representation within branches?
4. Can we surmise any relationships between these variables to form a hypothesis?
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Muhammad (2022, March 27). Data Analytics and Computational Social Science: KMuhammad_HW3. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma881594/
BibTeX citation
@misc{muhammad2022kmuhammad_hw3, author = {Muhammad, Kalimah}, title = {Data Analytics and Computational Social Science: KMuhammad_HW3}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma881594/}, year = {2022} }