KMuhammad HW2

DACSS 601 - Reading in Data

Kalimah Muhammad
2/16/2022

Dataset: 2012 Total US Railroad Employment

This dataset breaksdown the United States railroad employment by state and county in 2012. The dimensions of the data is 2930 rows and 3 columns.

library(tidyverse)
Railroad_Data<- read_csv("railroad_2012_clean_county_tidy.csv")

Dataset Variables

Railroad_Data
# A tibble: 2,930 x 3
   state county               total_employees
   <chr> <chr>                          <dbl>
 1 AE    APO                                2
 2 AK    ANCHORAGE                          7
 3 AK    FAIRBANKS NORTH STAR               2
 4 AK    JUNEAU                             3
 5 AK    MATANUSKA-SUSITNA                  2
 6 AK    SITKA                              1
 7 AK    SKAGWAY MUNICIPALITY              88
 8 AL    AUTAUGA                          102
 9 AL    BALDWIN                          143
10 AL    BARBOUR                            1
# ... with 2,920 more rows

There are two types of variables included in the dataset. The first type is a categorical variable displayed in the character vectors, county and state. The second set of variables is a continuous set of real numbers for the total employees.

Data Wrangling Options

I chose two data wrangling options in the dplyr package. The first uses the arrange function to order the counties with the most railroad employees in descending order.

Total Railroad Employees by County (descending)

arrange(Railroad_Data, desc(total_employees))
# A tibble: 2,930 x 3
   state county           total_employees
   <chr> <chr>                      <dbl>
 1 IL    COOK                        8207
 2 TX    TARRANT                     4235
 3 NE    DOUGLAS                     3797
 4 NY    SUFFOLK                     3685
 5 VA    INDEPENDENT CITY            3249
 6 FL    DUVAL                       3073
 7 CA    SAN BERNARDINO              2888
 8 CA    LOS ANGELES                 2545
 9 TX    HARRIS                      2535
10 NE    LINCOLN                     2289
# ... with 2,920 more rows

The second option uses the filter function to highlight railroad employees in the state of Massachusetts only.

Total Massachusetts Railroad Employees by County (descending)

Railroad_Data %>% filter(state == "MA")%>% arrange(desc(total_employees))
# A tibble: 12 x 3
   state county     total_employees
   <chr> <chr>                <dbl>
 1 MA    MIDDLESEX              673
 2 MA    SUFFOLK                558
 3 MA    PLYMOUTH               429
 4 MA    NORFOLK                386
 5 MA    ESSEX                  314
 6 MA    WORCESTER              310
 7 MA    BRISTOL                232
 8 MA    HAMPDEN                202
 9 MA    FRANKLIN               113
10 MA    HAMPSHIRE               68
11 MA    BERKSHIRE               50
12 MA    BARNSTABLE              44

Source: https://catalog.data.gov/dataset/total-railroad-employment-by-state-and-county-2012/resource/5a0b2831-23b9-4ce9-82e9-87a7d8f2c5d8

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Muhammad (2022, March 27). Data Analytics and Computational Social Science: KMuhammad HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma867132/

BibTeX citation

@misc{muhammad2022kmuhammad,
  author = {Muhammad, Kalimah},
  title = {Data Analytics and Computational Social Science: KMuhammad HW2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomkmuhamma867132/},
  year = {2022}
}