Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Sai Venkatesh
April 12, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
We are going to load the railroad data.
[1] "Lets load the data and see the dimensions and columns of the data."
[1] 2930 3
[1] "state" "county" "total_employees"
From the above, we can see that the Railroad data has 2930 rows and 3 columns. The 3 column names are state, county and total_employees.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The data seems to represent the railroad employees and the distribution of the employees by state and county.
[1] "The total number of employees in the states ordered by the count:-"
# A tibble: 53 × 2
state state_total
<chr> <int>
1 TX 19839
2 IL 19131
3 NY 17050
4 NE 13176
5 CA 13137
6 PA 12769
7 OH 9056
8 GA 8605
9 IN 8537
10 MO 8419
# … with 43 more rows
[1] "The counties with employees greater than 1000 ordered by the count:-"
state county total_employees
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
11 NY NASSAU 2076
12 MO JACKSON 2055
13 IN LAKE 1999
14 IL WILL 1784
15 PA PHILADELPHIA 1649
16 NE LANCASTER 1619
17 CA RIVERSIDE 1567
18 CT NEW HAVEN 1561
19 NY QUEENS 1470
20 KS JOHNSON 1286
21 DE NEW CASTLE 1275
22 NE BOX BUTTE 1168
23 NY DUTCHESS 1157
24 PA BUCKS 1106
25 NJ ESSEX 1097
26 NY WESTCHESTER 1040
27 WA KING 1039
We can see that Texas state has the most employees and Cook County has the most employees.
---
title: "Challenge 1"
author: "Sai Venkatesh"
description: "Reading in data and creating a post"
date: "04/12/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
We are going to load the railroad data.
```{r}
railroad <- read.csv('_data/railroad_2012_clean_county.csv')
print("Lets load the data and see the dimensions and columns of the data.")
# The Dimensions
dim(railroad)
# The Column Names
colnames(railroad)
```
From the above, we can see that the Railroad data has 2930 rows and 3 columns.
The 3 column names are state, county and total_employees.
```{r}
print("The top rows of the data are :- ")
head(railroad)
```
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
The data seems to represent the railroad employees and the distribution of the employees by state and county.
```{r}
# Number of Employees Per State
print("The total number of employees in the states ordered by the count:-")
railroad %>%
group_by(state) %>%
summarize(state_total = sum(total_employees)) %>%
select(state, state_total) %>%
arrange(desc(state_total))
```
```{r}
# Number of Employees > 1000
print("The counties with employees greater than 1000 ordered by the count:-")
railroad %>%
filter(total_employees > 1000) %>%
select(state, county, total_employees) %>%
arrange(desc(total_employees))
```
We can see that Texas state has the most employees and Cook County has the most employees.