Code
library(tidyverse)
library(readxl)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Shoshana Buck
August 16, 2022
The state County data is looking at the total railroad employment by state and county in the United States and Canada using the calendar year of 2012. The source of this data is most likely taken from the United States Railroad Retirement Board. There is a total of 2,931 rows which reflect the counties and there are 3 columns which reflect the “State”, “County”, and the “total employees” within the county.
# | Label: Reading in data
StateCounty<-read_excel("_data/StateCounty2012.xls",
skip = 4,
col_names= c("state", "delete", "county",
"delete", "employees"))%>%
select(!contains("delete"))%>%
filter(!str_detect(state, "Total"))
StateCounty<-head(StateCounty, -2)%>%
mutate(county = ifelse(state=="CANADA", "CANADA", county))
StateCounty
# A tibble: 2,931 × 3
state county employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,921 more rows
# ℹ Use `print(n = ...)` to see more rows
Though there are 2,929 state-counties there are only 53 unique States and 1708 unique counties, I found this by including n_distinct in the summarise() function.
# A tibble: 2,931 × 3
# Groups: state [54]
state county employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,921 more rows
# ℹ Use `print(n = ...)` to see more rows
# A tibble: 1 × 1
total.employees
<dbl>
1 256094
# A tibble: 2,931 × 3
# Groups: state [54]
state county total.employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
# … with 2,921 more rows
# ℹ Use `print(n = ...)` to see more rows
During the calendar year of 2012 the United States Railroad Retirement Board had 255,395 employees in the United States and Canada. I thought it was interesting that Cook county of Illinois had 8207 employees, which was the most employees as of 2012.
---
title: "Challenge 2 "
author: "Shoshana Buck"
description: "Data wrangling: using group() and summarise()"
date: "08/16/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_2
- Railroad
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readxl)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in the Data
The state County data is looking at the total railroad employment by state and county in the United States and Canada using the calendar year of 2012. The source of this data is most likely taken from the United States Railroad Retirement Board. There is a total of 2,931 rows which reflect the counties and there are 3 columns which reflect the "State", "County", and the "total employees" within the county.
```{r}
# | Label: Reading in data
StateCounty<-read_excel("_data/StateCounty2012.xls",
skip = 4,
col_names= c("state", "delete", "county",
"delete", "employees"))%>%
select(!contains("delete"))%>%
filter(!str_detect(state, "Total"))
StateCounty<-head(StateCounty, -2)%>%
mutate(county = ifelse(state=="CANADA", "CANADA", county))
StateCounty
```
## Describe the data
Though there are 2,929 state-counties there are only 53 unique States and 1708 unique counties, I found this by including n_distinct in the summarise() function.
```{r}
#| label: summary
StateCounty %>%
summarise(across(c(state), n_distinct))
StateCounty %>%
summarise(across(c(county), n_distinct))
```
## Provide Grouped Summary Statistics
```{r}
StateCounty %>%
group_by(state)
StateCounty %>%
summarise(total.employees=sum(employees))
StateCounty %>%
group_by(state,county) %>%
summarise(total.employees=sum(employees)) %>%
arrange(desc(total.employees))
```
### Explain and Interpret
During the calendar year of 2012 the United States Railroad Retirement Board had 255,395 employees in the United States and Canada. I thought it was interesting that Cook county of Illinois had 8207 employees, which was the most employees as of 2012.