Code
library(tidyverse)
library(summarytools)
library(ggplot2)
::opts_chunk$set(echo = TRUE) knitr
Joseph Vincent
February 15, 2023
Rows: 2930 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, county
dbl (1): total_employees
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
This data set consists of 3 columns: State, County and Total Employees. It appears that the data set is showing the number of railroad employees by county in the United States.
The least number of railroad employees in a given county is 1, and the greatest number of railroad employees in a given county is over 8,000.
There is a mean number of employees per county of about 87, but with a large standard deviation (283).
I learned that there are 31 counties with the name “Washington” and 26 counties with the name “Jefferson”.
[1] 2930 3
[1] "state" "county" "total_employees"
Data Frame Summary
railroad
Dimensions: 2930 x 3
Duplicates: 0
-----------------------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ----------------- -------------------------- --------------------- -------------------- ---------- ---------
1 state 1. TX 221 ( 7.5%) I 2930 0
[character] 2. GA 152 ( 5.2%) I (100.0%) (0.0%)
3. KY 119 ( 4.1%)
4. MO 115 ( 3.9%)
5. IL 103 ( 3.5%)
6. IA 99 ( 3.4%)
7. KS 95 ( 3.2%)
8. NC 94 ( 3.2%)
9. IN 92 ( 3.1%)
10. VA 92 ( 3.1%)
[ 43 others ] 1748 (59.7%) IIIIIIIIIII
2 county 1. WASHINGTON 31 ( 1.1%) 2930 0
[character] 2. JEFFERSON 26 ( 0.9%) (100.0%) (0.0%)
3. FRANKLIN 24 ( 0.8%)
4. LINCOLN 24 ( 0.8%)
5. JACKSON 22 ( 0.8%)
6. MADISON 19 ( 0.6%)
7. MONTGOMERY 18 ( 0.6%)
8. CLAY 17 ( 0.6%)
9. MARION 17 ( 0.6%)
10. MONROE 17 ( 0.6%)
[ 1699 others ] 2715 (92.7%) IIIIIIIIIIIIIIIIII
3 total_employees Mean (sd) : 87.2 (283.6) 404 distinct values : 2930 0
[numeric] min < med < max: : (100.0%) (0.0%)
1 < 21 < 8207 :
IQR (CV) : 58 (3.3) :
:
-----------------------------------------------------------------------------------------------------------------
# A tibble: 6 × 1
total_employees
<dbl>
1 2
2 7
3 2
4 3
5 2
6 1
[1] 1
[1] 8207
---
title: "Challenge 1 - Railroad Employees"
author: "Joseph Vincent"
desription: "Reading in data, describing, and creating first post"
date: "02/15/23"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- Joseph Vincent
---
```{r}
#| label: setup
#| warning: false
library(tidyverse)
library(summarytools)
library(ggplot2)
knitr::opts_chunk$set(echo = TRUE)
```
## Reading in the data
- railroad_2012_clean_county.csv ⭐
```{r}
# loading in dataset and assigning to variable 'railroad'
# using head to preview the dataset
railroad <- read_csv("_data/railroad_2012_clean_county.csv")
head(railroad)
```
## Describing the dataset
This data set consists of 3 columns: State, County and Total Employees. It appears that the data set is showing the number of railroad employees by county in the United States.
The least number of railroad employees in a given county is 1, and the greatest number of railroad employees in a given county is over 8,000.
There is a mean number of employees per county of about 87, but with a large standard deviation (283).
I learned that there are 31 counties with the name "Washington" and 26 counties with the name "Jefferson".
## Data summary
```{r}
#| label: summary
# finding the dimensions of 'railroad'
dim(railroad)
# finding the column names of 'railroad'
colnames(railroad)
#using summary tools
dfSummary(railroad)
# practice selecting employees column and calc min/max manually
employees <- select(railroad, total_employees)
head(employees)
min(employees)
max(employees)
```