HW2

HW2 for Haoyan

Haoyan Xiang
2021-12-29

In this homework, I work with the railroad(cleaned version) dataset.

knitr::opts_chunk$set(echo = TRUE)
# Read in csv file and print first few lines.
library(dplyr)
setwd("~/Downloads")
railroad = read.csv("railroad_2012_clean_state.csv")
head(railroad)
  state total_employees
1    AE               2
2    AK             103
3    AL            4257
4    AP               1
5    AR            3871
6    AZ            3153

Variable types

State: string, the abbreviation of the state

Total_employees: numeric, the total number of employees in the state

knitr::opts_chunk$set(echo = TRUE)
# Select the state column 
select(railroad,"state")
   state
1     AE
2     AK
3     AL
4     AP
5     AR
6     AZ
7     CA
8     CO
9     CT
10    DC
11    DE
12    FL
13    GA
14    HI
15    IA
16    ID
17    IL
18    IN
19    KS
20    KY
21    LA
22    MA
23    MD
24    ME
25    MI
26    MN
27    MO
28    MS
29    MT
30    NC
31    ND
32    NE
33    NH
34    NJ
35    NM
36    NV
37    NY
38    OH
39    OK
40    OR
41    PA
42    RI
43    SC
44    SD
45    TN
46    TX
47    UT
48    VA
49    VT
50    WA
51    WI
52    WV
53    WY
# sort by number of employees descending
arrange(railroad,desc(total_employees))
   state total_employees
1     TX           19839
2     IL           19131
3     NY           17050
4     NE           13176
5     CA           13137
6     PA           12769
7     OH            9056
8     GA            8605
9     IN            8537
10    MO            8419
11    NJ            8329
12    VA            7551
13    FL            7419
14    KS            6092
15    MN            5467
16    WA            5222
17    TN            4952
18    KY            4811
19    MD            4709
20    AL            4257
21    IA            4019
22    MI            3932
23    LA            3915
24    AR            3871
25    WI            3773
26    CO            3650
27    MA            3379
28    MT            3327
29    WV            3213
30    AZ            3153
31    NC            3143
32    WY            2876
33    CT            2592
34    OR            2322
35    OK            2318
36    SC            2296
37    ND            2204
38    MS            2111
39    NM            1958
40    UT            1917
41    ID            1563
42    DE            1495
43    SD             949
44    NV             746
45    ME             654
46    RI             487
47    NH             393
48    DC             279
49    VT             259
50    AK             103
51    HI               4
52    AE               2
53    AP               1
# sort by number of employees descending where total employees >= 5000
arrange(filter(railroad, `total_employees` >= 5000),desc(total_employees))
   state total_employees
1     TX           19839
2     IL           19131
3     NY           17050
4     NE           13176
5     CA           13137
6     PA           12769
7     OH            9056
8     GA            8605
9     IN            8537
10    MO            8419
11    NJ            8329
12    VA            7551
13    FL            7419
14    KS            6092
15    MN            5467
16    WA            5222

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill for R Markdown at https://rstudio.github.io/distill.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Xiang (2021, Dec. 30). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomjamesxiang11851716/

BibTeX citation

@misc{xiang2021hw2,
  author = {Xiang, Haoyan},
  title = {Data Analytics and Computational Social Science: HW2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomjamesxiang11851716/},
  year = {2021}
}