Data Analytics and Computational Social Science: KMuhammad_HW3

Kalimah Muhammad

Data Set

For this assignment I used the Department of Defense Active Duty Marital Status. This includes marital and child status demographics by gender and pay grade for various military branches as of April 2010.

Data Set Variables

The dimensions of the data set include 912 observations across 5 variables. There are three (3) character variables titled “enlisted”, “branch”, and “status” as well as two (2) numeric variables for “pay_grade” and “count”.

A. Enlisted - there are three (3) subcategories for “enlisted” status including “E” for Enlisted, “O” for Officer, and “W” for Warrant.

B. Pay_Grade - within each “enlisted” category are distinguished levels of pay_grade. Enlisted members have a pay grade of 1 - 9, while Officers’ pay grade is determined on a scale of 1 - 10. Finally, Warrant members’ pay grade is on a scale of 1 - 5.

C. Branch - this variable consists of five (5) subcategories of Department of Defense members, “TotalDoD,” “AirForce,” “MarineCorps,” “Navy,” and “Army.”

D. Status - there are eight (8) distinct “status” variables representing demographic information across each branch describing the service members’ marital, gender, and child status. These 8 subcategories include:
* married civilian female,
* married civilian male,
* married joint service female,
* married joint service male,
* single with children female,
* single with children male,
* single without children female, and
* single without children male.
Note, it is not obvious at this time if the child status of married service members is captured in the data.

E. Count - this is a numeric value of the number of occurrences per distinct observation.

An observation is the unique combination of enlisted status, pay grade, marital/child status, and branch.

Read in Data Set

#View composition of Marital Status data
print(Marital_status)

# A tibble: 912 x 5
   pay_grade enlisted branch   status                        count
       <dbl> <chr>    <chr>    <chr>                         <dbl>
 1         1 E        TotalDoD single withoutchildren male   31229
 2         1 E        TotalDoD single withoutchildren female  5717
 3         1 E        TotalDoD single withchildren male        563
 4         1 E        TotalDoD single withchildren female      122
 5         1 E        TotalDoD married jointservice male       139
 6         1 E        TotalDoD married jointservice female     141
 7         1 E        TotalDoD married civilian female        5060
 8         1 E        TotalDoD married civilian male           719
 9         2 E        TotalDoD single withoutchildren male   53094
10         2 E        TotalDoD single withoutchildren female  8388
# ... with 902 more rows

tail(Marital_status)

# A tibble: 6 x 5
  pay_grade enlisted branch status                      count
      <dbl> <chr>    <chr>  <chr>                       <dbl>
1         5 W        Army   single withchildren male       20
2         5 W        Army   single withchildren female      1
3         5 W        Army   married jointservice male      11
4         5 W        Army   married jointservice female     4
5         5 W        Army   married civilian female       504
6         5 W        Army   married civilian male          10

Range of Data

To start I wanted to understand the most common occurrences in the data, I began by viewing the highest and lowest status representation within the Total DoD by pay grade. The following table shows the top 10 and bottom 10 observations by count.

#Top 10 most frequent status observations within Total DoD 
filter(Marital_status, branch == "TotalDoD") %>%
arrange(desc(count))

# A tibble: 192 x 5
   pay_grade enlisted branch   status                       count
       <dbl> <chr>    <chr>    <chr>                        <dbl>
 1         3 E        TotalDoD single withoutchildren male 131091
 2         5 E        TotalDoD married civilian female     130944
 3         4 E        TotalDoD single withoutchildren male 112710
 4         6 E        TotalDoD married civilian female     110322
 5         4 E        TotalDoD married civilian female     105556
 6         7 E        TotalDoD married civilian female      70001
 7         5 E        TotalDoD single withoutchildren male  57989
 8         3 E        TotalDoD married civilian female      54795
 9         2 E        TotalDoD single withoutchildren male  53094
10         3 O        TotalDoD married civilian female      38963
# ... with 182 more rows

#Bottom 10 least frequent status observations within Total DoD 
filter(Marital_status, branch == "TotalDoD") %>%
arrange(count)

# A tibble: 192 x 5
   pay_grade enlisted branch   status                        count
       <dbl> <chr>    <chr>    <chr>                         <dbl>
 1         8 O        TotalDoD single withchildren male          0
 2         8 O        TotalDoD single withchildren female        0
 3         9 O        TotalDoD single withchildren female        0
 4        10 O        TotalDoD single withoutchildren female     0
 5        10 O        TotalDoD single withchildren male          0
 6        10 O        TotalDoD single withchildren female        0
 7        10 O        TotalDoD married civilian male             0
 8         7 O        TotalDoD single withchildren female        1
 9         9 O        TotalDoD single withoutchildren male       1
10         9 O        TotalDoD single withoutchildren female     1
# ... with 182 more rows

Most common occurrences - Here we can see the highest marital status representation by Total DoD members are enlisted, single males without children in pay grade 3 with 131,091 occurrences. A close second, by less than 100 occurrences, are enlisted, married civilian females within pay grade 5 totaling 130,994. We can begin to see a trend of common occurrences amongst enlisted single males without children and enlisted civilian females with a median pay grade of 4. Although based on this observation, these two groups are the most represented groups within the data set irrespective of pay grade.

#create Top 10 Total DoD vector by occurrence
#calculate median pay grade
TOP_10_DoD = c(3, 5, 4, 6, 4, 7, 5, 3, 2, 3)
median(TOP_10_DoD)

[1] 4

#create Bottom 10 Total DoD vector by occurrence
#calculate median pay grade
Bottom_10_DoD = c(8, 8, 9, 10, 10, 10, 10, 7, 9, 9)
median(Bottom_10_DoD)

[1] 9

Least common occurrences - Alternatively, the least representation is amongst single Officers primarily females but also males with a median pay grade of 9. From this we can begin to form some initial research questions.

Potential Research Questions

Before I finalize my research questions, I want to conduct more exploratory analysis of the data. So I would first like to answer:
1. What is the relational composition of the Department of Defense by gender and marital status?
2. Are there any significant variances in representation within pay grades?
3. Are there any significant variances in representation within branches?
4. Can we surmise any relationships between these variables to form a hypothesis?

Source

Source: [https://catalog.data.gov/dataset/active-duty-marital-status/resource/638cad03-b16c-48ac-8346-f858ff89d202]

Comment on this article Share:

KMuhammad_HW3