Challenge 1

challenge_1

railroad

Reading in data and creating a post

Author

Ananya Pujary

Published

August 15, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Read in the Data

Code

railroad <- read_csv("_data/railroad_2012_clean_county.csv")

Describe the data

I’ll be working with the ‘railroad_2012_clean_county.csv’ dataset.

Code

dim(railroad) #describing the 'railroad' dataset's dimensions

[1] 2930    3

From this command, we learn that the ‘railroad_2012_clean_county.csv’ dataset has 3 columns and 2930 rows.

Code

colnames(railroad)

[1] "state"           "county"          "total_employees"

Code

head(railroad)

# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

The columns in ‘railroad’ are: ‘state’ (datatype: character), ‘county’(datatype: character), and ‘total_employees’(datatype: double class - numeric values with decimal points). These data were probably collected as part of a large-scale survey of the number of railroad employees by county and state in the United States.

Code

railroad_arranged <- railroad %>%
  arrange(desc(total_employees)) # arranging data to find the county with the most number of employees
head(railroad_arranged)

# A tibble: 6 × 3
  state county           total_employees
  <chr> <chr>                      <dbl>
1 IL    COOK                        8207
2 TX    TARRANT                     4235
3 NE    DOUGLAS                     3797
4 NY    SUFFOLK                     3685
5 VA    INDEPENDENT CITY            3249
6 FL    DUVAL                       3073

Cook county in Illinois has the highest number of railroad employees (8207).

Code

railroads<- railroad %>%
  group_by(state) %>%  # grouping the data by state
  select(total_employees) %>% # looking only at the 'total_employees' column
  summarize_all(sum, na.rm=TRUE)%>% # adding the number of employees in the counties state-wise
  arrange(desc(total_employees)) # arranging the states from highest to lowest number of employees

head(railroads)

# A tibble: 6 × 2
  state total_employees
  <chr>           <dbl>
1 TX              19839
2 IL              19131
3 NY              17050
4 NE              13176
5 CA              13137
6 PA              12769

Texas has the most railroad employees (19839) and the Armed Forces Pacific has the least (1).