Homework 2 Read In, Explain, Dplyr Wrangling

This homework reads in, explains variables, and Data Wrangles

K Martins
2021-12-29

Introduction

This is K Martins’s submission for Homework Two is DACSS 601. In this I will read in a clean excel data set railroad_2012_clean_county. Then I will explain the variables, and wrangle the data four times within subtitles below.

Note that I attempted to add in another column that listed what % of each county employee count was to the sum of the State (e.g. what % of railroad employees in MA worked in Suffolk County). Unfortunately I was unsuccessful and did not include here, maybe we can go over this in class?

Kevin’s Set-Ups

Read In Railroad Employees by County Data and explain the Variables

Here I read in the Railroad Total Employees by county data from my set working directory. The file was a clean excel data set. I previewed the first five rows using the head() function. The variables are the columns which are state in or character, county in , and total_employees in or numeric.

# A tibble: 6 x 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Filter for Continental US

In this data-wrangling I have filtered for only States within the Continental US to include DC. I did this by filtering out and using the NOT != function. I Would have liked to embed an OR function but I couldn’t get my nest to work. Instead I listed a NOT function for each State I was filtering out.

# A tibble: 6 x 3
  state county  total_employees
  <chr> <chr>             <dbl>
1 AL    AUTAUGA             102
2 AL    BALDWIN             143
3 AL    BARBOUR               1
4 AL    BIBB                 25
5 AL    BLOUNT              154
6 AL    BULLOCK              13

Arrange States Alphabetical Order then by Total Employees

In this dplyr function I arranged the States that had been filtered to the Continental US by Alphabetical order then by Total Employees in descending order.

# A tibble: 6 x 3
  state county    total_employees
  <chr> <chr>               <dbl>
1 AL    JEFFERSON             990
2 AL    MOBILE                331
3 AL    COLBERT               199
4 AL    WALKER                192
5 AL    ST CLAIR              162
6 AL    SHELBY                158

Count Counties by State

Using the group by and summarise functions I counted the counties per each state within the dataset. Essentially, I was mimicking a count Excel pivot function. As I mentioned in the introduction, I would have liked to to add in another column that listed what % of each county employee count was to the sum of the State (e.g. what % of railroad employees in MA worked in Suffolk County) but was unsuccessful. Maybe this is something that we can go over in class.

# A tibble: 6 x 2
  state total_employees
  <chr>           <int>
1 AL                 67
2 AR                 72
3 AZ                 15
4 CA                 55
5 CO                 57
6 CT                  8

This is to print the full list of counties per state in the dataset.

# A tibble: 49 x 2
   state total_employees
   <chr>           <int>
 1 AL                 67
 2 AR                 72
 3 AZ                 15
 4 CA                 55
 5 CO                 57
 6 CT                  8
 7 DC                  1
 8 DE                  3
 9 FL                 67
10 GA                152
# ... with 39 more rows

This is the end of the document.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Martins (2021, Dec. 30). Data Analytics and Computational Social Science: Homework 2 Read In, Explain, Dplyr Wrangling. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommartike8851772/

BibTeX citation

@misc{martins2021homework,
  author = {Martins, K},
  title = {Data Analytics and Computational Social Science: Homework 2 Read In, Explain, Dplyr Wrangling},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommartike8851772/},
  year = {2021}
}