challenge_1
railroads
faostat
wildbirds
Railroads
Author

Kris Smole

Published

February 22, 2023

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Overview of Dataset contained in filename railroad_2012_clean_county.csv

Railroad, as I will call this dataset, contains distinct observations of total employee counts, by counties within US states/territories. The dataset was likely gathered by the railroad company or perhaps one of the regulatory bodies governing the railroad.

Code
railroad <-read_csv("_data/railroad_2012_clean_county.csv" )

The railroad data observations are for US by state/territory and county within respective state, with count of total employees for that county within that state. Total by number of observations are 2,930, and are in alphabetic order by state/territory, then alphabetic by county within each state/territory.

Specific views and statistics of the observations

Code
select(railroad, state, county, total_employees)
# A tibble: 2,930 × 3
   state county               total_employees
   <chr> <chr>                          <dbl>
 1 AE    APO                                2
 2 AK    ANCHORAGE                          7
 3 AK    FAIRBANKS NORTH STAR               2
 4 AK    JUNEAU                             3
 5 AK    MATANUSKA-SUSITNA                  2
 6 AK    SITKA                              1
 7 AK    SKAGWAY MUNICIPALITY              88
 8 AL    AUTAUGA                          102
 9 AL    BALDWIN                          143
10 AL    BARBOUR                            1
# … with 2,920 more rows

We can see in the results above, and confirmed below, that this dataset contains a count of rows (2930) and columns (3).

Code
dim(railroad)
[1] 2930    3

The dataset represents US States and counties. Specifically, the railroad data inlcudes 53 distinct state names. Since the count of state names is over 50, I conclude that the District of Columbia and US territories are likely included in the data observations.

Code
railroad%>%
  select (state) %>%
  n_distinct(.)
[1] 53

How many counties have distinct names within this dataset? The answer indicates that some county names repeat across various states, as the total county names are not = 2930, the total number of observations. Noted for future communications to describe observations by both county and state for clarity.

Code
railroad%>%
  select(county)%>%
  n_distinct(.)
[1] 1709

Which counties have the most employees within this set of observations? It appears that Illinois’ Cook County has the highest count of employees, indicating this railroad has significant operations in Cook County, the home of the city of Chicago.

Code
arrange(railroad, desc(`total_employees`))
# A tibble: 2,930 × 3
   state county           total_employees
   <chr> <chr>                      <dbl>
 1 IL    COOK                        8207
 2 TX    TARRANT                     4235
 3 NE    DOUGLAS                     3797
 4 NY    SUFFOLK                     3685
 5 VA    INDEPENDENT CITY            3249
 6 FL    DUVAL                       3073
 7 CA    SAN BERNARDINO              2888
 8 CA    LOS ANGELES                 2545
 9 TX    HARRIS                      2535
10 NE    LINCOLN                     2289
# … with 2,920 more rows

Which counties have the fewest employees of this railroad? Numerous counties have only 1 employee per county. When looking at the full dataset, 145 counties have only 1 employee. Future reports to provide breakdowns by designated ranges of employee count (eg: 1-50,50-100, etc.) using yet to be learned coding tools.

Code
arrange(railroad, `total_employees`)
# A tibble: 2,930 × 3
   state county   total_employees
   <chr> <chr>              <dbl>
 1 AK    SITKA                  1
 2 AL    BARBOUR                1
 3 AL    HENRY                  1
 4 AP    APO                    1
 5 AR    NEWTON                 1
 6 CA    MONO                   1
 7 CO    BENT                   1
 8 CO    CHEYENNE               1
 9 CO    COSTILLA               1
10 CO    DOLORES                1
# … with 2,920 more rows

Which counties have more than 200 employees? The following list includes only counties with employee counts of 200 or greater, in alphabetic order. Future reports to include lists ranked highest to lowest in addition to alphbetic order by state, county as shown here.

Code
filter(railroad,`total_employees` >=200)
# A tibble: 280 × 3
   state county    total_employees
   <chr> <chr>               <dbl>
 1 AL    JEFFERSON             990
 2 AL    MOBILE                331
 3 AR    FAULKNER              289
 4 AR    JEFFERSON             361
 5 AR    LONOKE                330
 6 AR    PULASKI               972
 7 AR    SALINE                262
 8 AZ    APACHE                270
 9 AZ    COCONINO              268
10 AZ    MARICOPA              462
# … with 270 more rows

The overall mean count of employees of all counties within all states is 87.2 employees

Code
summarize(railroad,mean(`total_employees`))
# A tibble: 1 × 1
  `mean(total_employees)`
                    <dbl>
1                    87.2

The overall median of employee count of all counties within all states is 21 employees

Code
summarize(railroad, median(`total_employees`))
# A tibble: 1 × 1
  `median(total_employees)`
                      <dbl>
1                        21

Conclusion

Future reports to contain totals of employees by state, which will require use of yet unknown functions. Combining the state and county fields for each observation will allow the existing data to be totalled by state using specific coding commands. Clean-up of the table results to be pursued. Additionally, graphic visualizations, and geo-graphics to be included in future reports.