HW 2

Reading in & tidying data

Alexis Gamez
2/8/2022

Setup

Setting up R so that xlsx files/data can be read into the program and tidyverse packages are active.

Reading in the Data

Imported data from the xlsx file by navigating the “Files” tab under the plots pane. While you can include all data as is, I excluded the top 8 rows to tidy the data a bit. This can also be done using code as shown in the chunk below. Also, note that in the same chunk, I’ve renamed the original file name from “ActiveDuty_MaritalStatus” to “Marital_Status”. This data set only takes into account the 1st sheet within the excel file that was used for reference.

ActiveDuty_MaritalStatus <- read_excel("HW2/ActiveDuty_MaritalStatus.xlsx", skip = 8)
Marital_Status <- ActiveDuty_MaritalStatus

#Confirming that the data set was renamed correctly
Marital_Status
# A tibble: 30 x 17
   ...1  `Pay Grade` Male...3 Female...4 Total...5 Male...6 Female...7
   <chr> <chr>          <dbl>      <dbl>     <dbl>    <dbl>      <dbl>
 1 <NA>  E-1            31229       5717     36946      563        122
 2 <NA>  E-2            53094       8388     61482     1457        275
 3 <NA>  E-3           131091      21019    152110     4264       1920
 4 <NA>  E-4           112710      16381    129091     9491       4662
 5 <NA>  E-5            57989      11021     69010    10937       6576
 6 <NA>  E-6            19125       4654     23779    10369       4962
 7 <NA>  E-7             5446       1913      7359     6530       2585
 8 <NA>  E-8             1009        438      1447     1786        513
 9 <NA>  E-9              381        202       583      579        144
10 <NA>  TOTAL ENLI~   412074      69733    481807    45976      21759
# ... with 20 more rows, and 10 more variables: Total...8 <dbl>,
#   Male...9 <dbl>, Female...10 <dbl>, Total...11 <dbl>,
#   Male...12 <dbl>, Female...13 <dbl>, Total...14 <dbl>,
#   Male...15 <dbl>, Female...16 <dbl>, Total...17 <dbl>

Interpreting the Variables

The data itself categorizes all active duty personnel according to their pay grade and current marital status. Marital status is defined as either single without children, single with children, joint service marriage, or civilian marriage with an additional category representing the summed total between all of them in that specific pay grade.

There are also 3 sub-categories included in each of the previously mentioned categories (except the grand totals column) further breaking down the data according to gender (Male & Female) and the 3rd being the total between both.

I’ve included code below that renames all the respective columns to something a bit more legible with those ending in; WO representing active duty personnel that are single without children, W representing those that are single with children, J representing those in a joint service marriage, C representing those a part of a civilian marriage and the final column representing the grand total.

Marital_Status <- rename(Marital_Status, Male_WO = Male...3, Fem_WO = Female...4, Total_WO = Total...5, Male_W = Male...6, Fem_W = Female...7, Total_W = Total...8, Male_J = Male...9, Fem_J = Female...10, Total_J = Total...11, Male_C = Male...12, Fem_C = Female...13, Total_C = Total...14, Male_Total = Male...15, Fem_Total = Female...16, Grand_Total = Total...17)

#Checking that all categories were renamed correctly
head(Marital_Status) %>%
  select("Pay Grade": "Grand_Total")
# A tibble: 6 x 16
  `Pay Grade` Male_WO Fem_WO Total_WO Male_W Fem_W Total_W Male_J
  <chr>         <dbl>  <dbl>    <dbl>  <dbl> <dbl>   <dbl>  <dbl>
1 E-1           31229   5717    36946    563   122     685    139
2 E-2           53094   8388    61482   1457   275    1732    438
3 E-3          131091  21019   152110   4264  1920    6184   3579
4 E-4          112710  16381   129091   9491  4662   14153   8661
5 E-5           57989  11021    69010  10937  6576   17513  12459
6 E-6           19125   4654    23779  10369  4962   15331   8474
# ... with 8 more variables: Fem_J <dbl>, Total_J <dbl>,
#   Male_C <dbl>, Fem_C <dbl>, Total_C <dbl>, Male_Total <dbl>,
#   Fem_Total <dbl>, Grand_Total <dbl>

All data included in this specific data set would be considered numeric.

Data Wrangling

The data present within the imported set can be interpreted in many ways, but for the sake of this assignment, I’ve chosen to sort them in 2 different ways according to the male gender.

Total Males According to Pay Grade

To start I used the code present in the chunk below to sort all columns except for the male totals and the pay grade they belong to.

Marital_Status %>%
  select("Pay Grade", "Male_Total")
# A tibble: 30 x 2
   `Pay Grade`    Male_Total
   <chr>               <dbl>
 1 E-1                 36991
 2 E-2                 67472
 3 E-3                193729
 4 E-4                236418
 5 E-5                212329
 6 E-6                148290
 7 E-7                 87042
 8 E-8                 25297
 9 E-9                  9633
10 TOTAL ENLISTED    1017201
# ... with 20 more rows

This clearly shows us the total active duty males categorized by their pay grade. Nothing fancy, the data is sorted according to the order in which the pay grades are originally listed in the respective excel document.

Sorting Total Males According to their Pay Grade in Descending Order

To take it a step further, I’ve included the code below that further sorts the previous data but in descending order. This code omits the “GRAND TOTAL”, “TOTAL ENLISTED”, and “TOTAL OFFICER” rows in order to tidy the data so that only each specific pay grade is shown.

Marital_Status %>%
  select("Pay Grade", "Male_Total") %>%
  filter(`Pay Grade` != "GRAND TOTAL" & `Pay Grade` != "TOTAL ENLISTED" & `Pay Grade` != "TOTAL OFFICER") %>%
  ungroup() %>%
  arrange(desc(Male_Total))
# A tibble: 25 x 2
   `Pay Grade` Male_Total
   <chr>            <dbl>
 1 E-4             236418
 2 E-5             212329
 3 E-3             193729
 4 E-6             148290
 5 E-7              87042
 6 E-2              67472
 7 O-3              57973
 8 O-4              38492
 9 E-1              36991
10 O-5              25341
# ... with 15 more rows

The filter() command removes the previously mentioned rows, the ungroup() function breaks up the format of the columns so that they may be sorted in descending order and lastly, the arrange function actually sorts the data in the aforementioned descending order. Here we can clearly see that a majority of enlisted males, regardless of marital status, belong to the E-4 pay grade.

I hope that this submission meets all the required criteria, thank you for reading!

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Gamez (2022, Feb. 9). Data Analytics and Computational Social Science: HW 2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654863847/

BibTeX citation

@misc{gamez2022hw,
  author = {Gamez, Alexis},
  title = {Data Analytics and Computational Social Science: HW 2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomgamez654863847/},
  year = {2022}
}