Challenge 1 Abby Balint

challenge_1

railroads

abby_balint

Author

Abby Balint

Published

September 15, 2022

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

railroad_2012_clean_county.csv ⭐

Code

library(tidyverse)
read_csv("_data/railroad_2012_clean_county.csv")

# A tibble: 2,930 × 3
   state county               total_employees
   <chr> <chr>                          <dbl>
 1 AE    APO                                2
 2 AK    ANCHORAGE                          7
 3 AK    FAIRBANKS NORTH STAR               2
 4 AK    JUNEAU                             3
 5 AK    MATANUSKA-SUSITNA                  2
 6 AK    SITKA                              1
 7 AK    SKAGWAY MUNICIPALITY              88
 8 AL    AUTAUGA                          102
 9 AL    BALDWIN                          143
10 AL    BARBOUR                            1
# … with 2,920 more rows

Code

railroad <- read_csv("_data/railroad_2012_clean_county.csv")

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Description

This data set includes the number of railroad employees in 2012 by both state and county.This data is relatively limited because it only has three columns: state, county, and total employees, that we can use in conjunction with each other to analyze. The data contains 2930 rows. The range of employee numbers by county is quite wide so it could be useful to compare ranges and average counts.

Code

colnames(railroad)

[1] "state"           "county"          "total_employees"

Code

dim(railroad)

[1] 2930    3

Filtering

If I filter the data by a single state, it makes it easier to look at the number of employees and the county breakdown by that state. (Kentucky example) I can now see there is 119 counties (rows) reported.

Code

library(dplyr)
filter(railroad, `state` == "KY")

# A tibble: 119 × 3
   state county   total_employees
   <chr> <chr>              <dbl>
 1 KY    ADAIR                  1
 2 KY    ALLEN                  5
 3 KY    ANDERSON               5
 4 KY    BALLARD                7
 5 KY    BARREN                 5
 6 KY    BATH                   3
 7 KY    BELL                  27
 8 KY    BOONE                236
 9 KY    BOURBON                8
10 KY    BOYD                 232
# … with 109 more rows

Code

railroadKY <- filter(railroad, `state` == "KY")

Average

And here I found the average number of railroad employees in Kentucky.

Code

mean(railroadKY$`total_employees`)

[1] 40.42857

Code

filter(railroadKY, `total_employees` >=200)

# A tibble: 7 × 3
  state county    total_employees
  <chr> <chr>               <dbl>
1 KY    BOONE                 236
2 KY    BOYD                  232
3 KY    GREENUP               483
4 KY    JEFFERSON             413
5 KY    KENTON                244
6 KY    PIKE                  231
7 KY    WHITLEY               322