This homework reads in, explains variables, and Data Wrangles
This is K Martins’s submission for Homework Two is DACSS 601. In this I will read in a clean excel data set railroad_2012_clean_county. Then I will explain the variables, and wrangle the data four times within subtitles below.
Note that I attempted to add in another column that listed what % of each county employee count was to the sum of the State (e.g. what % of railroad employees in MA worked in Suffolk County). Unfortunately I was unsuccessful and did not include here, maybe we can go over this in class?
Here I read in the Railroad Total Employees by county data from my set working directory. The file was a clean excel data set. I previewed the first five rows using the head() function. The variables are the columns which are state in
# A tibble: 6 x 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
In this data-wrangling I have filtered for only States within the Continental US to include DC. I did this by filtering out and using the NOT != function. I Would have liked to embed an OR function but I couldn’t get my nest to work. Instead I listed a NOT function for each State I was filtering out.
# A tibble: 6 x 3
state county total_employees
<chr> <chr> <dbl>
1 AL AUTAUGA 102
2 AL BALDWIN 143
3 AL BARBOUR 1
4 AL BIBB 25
5 AL BLOUNT 154
6 AL BULLOCK 13
In this dplyr function I arranged the States that had been filtered to the Continental US by Alphabetical order then by Total Employees in descending order.
# A tibble: 6 x 3
state county total_employees
<chr> <chr> <dbl>
1 AL JEFFERSON 990
2 AL MOBILE 331
3 AL COLBERT 199
4 AL WALKER 192
5 AL ST CLAIR 162
6 AL SHELBY 158
Using the group by and summarise functions I counted the counties per each state within the dataset. Essentially, I was mimicking a count Excel pivot function. As I mentioned in the introduction, I would have liked to to add in another column that listed what % of each county employee count was to the sum of the State (e.g. what % of railroad employees in MA worked in Suffolk County) but was unsuccessful. Maybe this is something that we can go over in class.
# A tibble: 6 x 2
state total_employees
<chr> <int>
1 AL 67
2 AR 72
3 AZ 15
4 CA 55
5 CO 57
6 CT 8
This is to print the full list of counties per state in the dataset.
# A tibble: 49 x 2
state total_employees
<chr> <int>
1 AL 67
2 AR 72
3 AZ 15
4 CA 55
5 CO 57
6 CT 8
7 DC 1
8 DE 3
9 FL 67
10 GA 152
# ... with 39 more rows
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Martins (2021, Dec. 30). Data Analytics and Computational Social Science: Homework 2 Read In, Explain, Dplyr Wrangling. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommartike8851772/
BibTeX citation
@misc{martins2021homework, author = {Martins, K}, title = {Data Analytics and Computational Social Science: Homework 2 Read In, Explain, Dplyr Wrangling}, url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommartike8851772/}, year = {2021} }