Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Kris Smole
February 22, 2023
Railroad, as I will call this dataset, contains distinct observations of total employee counts, by counties within US states/territories. The dataset was likely gathered by the railroad company or perhaps one of the regulatory bodies governing the railroad.
The railroad data observations are for US by state/territory and county within respective state, with count of total employees for that county within that state. Total by number of observations are 2,930, and are in alphabetic order by state/territory, then alphabetic by county within each state/territory.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,920 more rows
We can see in the results above, and confirmed below, that this dataset contains a count of rows (2930) and columns (3).
The dataset represents US States and counties. Specifically, the railroad data inlcudes 53 distinct state names. Since the count of state names is over 50, I conclude that the District of Columbia and US territories are likely included in the data observations.
How many counties have distinct names within this dataset? The answer indicates that some county names repeat across various states, as the total county names are not = 2930, the total number of observations. Noted for future communications to describe observations by both county and state for clarity.
Which counties have the most employees within this set of observations? It appears that Illinois’ Cook County has the highest count of employees, indicating this railroad has significant operations in Cook County, the home of the city of Chicago.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
# … with 2,920 more rows
Which counties have the fewest employees of this railroad? Numerous counties have only 1 employee per county. When looking at the full dataset, 145 counties have only 1 employee. Future reports to provide breakdowns by designated ranges of employee count (eg: 1-50,50-100, etc.) using yet to be learned coding tools.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AK SITKA 1
2 AL BARBOUR 1
3 AL HENRY 1
4 AP APO 1
5 AR NEWTON 1
6 CA MONO 1
7 CO BENT 1
8 CO CHEYENNE 1
9 CO COSTILLA 1
10 CO DOLORES 1
# … with 2,920 more rows
Which counties have more than 200 employees? The following list includes only counties with employee counts of 200 or greater, in alphabetic order. Future reports to include lists ranked highest to lowest in addition to alphbetic order by state, county as shown here.
# A tibble: 280 × 3
state county total_employees
<chr> <chr> <dbl>
1 AL JEFFERSON 990
2 AL MOBILE 331
3 AR FAULKNER 289
4 AR JEFFERSON 361
5 AR LONOKE 330
6 AR PULASKI 972
7 AR SALINE 262
8 AZ APACHE 270
9 AZ COCONINO 268
10 AZ MARICOPA 462
# … with 270 more rows
The overall mean count of employees of all counties within all states is 87.2 employees
# A tibble: 1 × 1
`mean(total_employees)`
<dbl>
1 87.2
The overall median of employee count of all counties within all states is 21 employees
Future reports to contain totals of employees by state, which will require use of yet unknown functions. Combining the state and county fields for each observation will allow the existing data to be totalled by state using specific coding commands. Clean-up of the table results to be pursued. Additionally, graphic visualizations, and geo-graphics to be included in future reports.
---
title: "Challenge_1"
author: "Kris Smole"
description: "Railroads"
date: "02/22/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
# Overview of Dataset contained in filename railroad_2012_clean_county.csv
Railroad, as I will call this dataset, contains distinct observations of total employee counts, by counties within US states/territories. The dataset was likely gathered by the railroad company or perhaps one of the regulatory bodies governing the railroad.
```{r}
railroad <-read_csv("_data/railroad_2012_clean_county.csv" )
```
The railroad data observations are for US by state/territory and county within respective state, with count of total employees for that county within that state. Total by number of observations are 2,930, and are in alphabetic order by state/territory, then alphabetic by county within each state/territory.
# Specific views and statistics of the observations
```{r}
select(railroad, state, county, total_employees)
```
We can see in the results above, and confirmed below, that this dataset contains a count of rows (2930) and columns (3).
```{r}
dim(railroad)
```
The dataset represents US States and counties. Specifically, the railroad data inlcudes 53 distinct state names. Since the count of state names is over 50, I conclude that the District of Columbia and US territories are likely included in the data observations.
```{r}
railroad%>%
select (state) %>%
n_distinct(.)
```
How many counties have distinct names within this dataset? The answer indicates that some county names repeat across various states, as the total county names are not = 2930, the total number of observations. Noted for future communications to describe observations by both county and state for clarity.
```{r}
railroad%>%
select(county)%>%
n_distinct(.)
```
Which counties have the most employees within this set of observations? It appears that Illinois' Cook County has the highest count of employees, indicating this railroad has significant operations in Cook County, the home of the city of Chicago.
```{r}
arrange(railroad, desc(`total_employees`))
```
Which counties have the fewest employees of this railroad? Numerous counties have only 1 employee per county. When looking at the full dataset, 145 counties have only 1 employee. Future reports to provide breakdowns by designated ranges of employee count (eg: 1-50,50-100, etc.) using yet to be learned coding tools.
```{r}
arrange(railroad, `total_employees`)
```
Which counties have more than 200 employees? The following list includes only counties with employee counts of 200 or greater, in alphabetic order. Future reports to include lists ranked highest to lowest in addition to alphbetic order by state, county as shown here.
```{r}
filter(railroad,`total_employees` >=200)
```
The overall mean count of employees of all counties within all states is 87.2 employees
```{r}
summarize(railroad,mean(`total_employees`))
```
The overall median of employee count of all counties within all states is 21 employees
```{r}
summarize(railroad, median(`total_employees`))
```
# Conclusion
Future reports to contain totals of employees by state, which will require use of yet unknown functions. Combining the state and county fields for each observation will allow the existing data to be totalled by state using specific coding commands. Clean-up of the table results to be pursued. Additionally, graphic visualizations, and geo-graphics to be included in future reports.