Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Ananya Pujary
August 15, 2022
I’ll be working with the ‘railroad_2012_clean_county.csv’ dataset.
From this command, we learn that the ‘railroad_2012_clean_county.csv’ dataset has 3 columns and 2930 rows.
[1] "state" "county" "total_employees"
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
The columns in ‘railroad’ are: ‘state’ (datatype: character), ‘county’(datatype: character), and ‘total_employees’(datatype: double class - numeric values with decimal points). These data were probably collected as part of a large-scale survey of the number of railroad employees by county and state in the United States.
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
Cook county in Illinois has the highest number of railroad employees (8207).
railroads<- railroad %>%
group_by(state) %>% # grouping the data by state
select(total_employees) %>% # looking only at the 'total_employees' column
summarize_all(sum, na.rm=TRUE)%>% # adding the number of employees in the counties state-wise
arrange(desc(total_employees)) # arranging the states from highest to lowest number of employees
head(railroads)
# A tibble: 6 × 2
state total_employees
<chr> <dbl>
1 TX 19839
2 IL 19131
3 NY 17050
4 NE 13176
5 CA 13137
6 PA 12769
Texas has the most railroad employees (19839) and the Armed Forces Pacific has the least (1).
---
title: "Challenge 1"
author: "Ananya Pujary"
description: "Reading in data and creating a post"
date: "08/15/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroad
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Read in the Data
```{r}
#| label: reading in the data
railroad <- read_csv("_data/railroad_2012_clean_county.csv")
```
## Describe the data
I'll be working with the 'railroad_2012_clean_county.csv' dataset.
```{r}
#| label: summary 1
dim(railroad) #describing the 'railroad' dataset's dimensions
```
From this command, we learn that the 'railroad_2012_clean_county.csv' dataset has 3 columns and 2930 rows.
```{r}
#| label: summary 2
colnames(railroad)
head(railroad)
```
The columns in 'railroad' are: 'state' (datatype: character), 'county'(datatype: character), and 'total_employees'(datatype: double class - numeric values with decimal points). These data were probably collected as part of a large-scale survey of the number of railroad employees by county and state in the United States.
```{r}
#| label: summary 3
railroad_arranged <- railroad %>%
arrange(desc(total_employees)) # arranging data to find the county with the most number of employees
head(railroad_arranged)
```
Cook county in Illinois has the highest number of railroad employees (8207).
```{r}
#| label: summary 4
railroads<- railroad %>%
group_by(state) %>% # grouping the data by state
select(total_employees) %>% # looking only at the 'total_employees' column
summarize_all(sum, na.rm=TRUE)%>% # adding the number of employees in the counties state-wise
arrange(desc(total_employees)) # arranging the states from highest to lowest number of employees
head(railroads)
```
Texas has the most railroad employees (19839) and the Armed Forces Pacific has the least (1).