Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Abby Balint
September 15, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,920 more rows
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
This data set includes the number of railroad employees in 2012 by both state and county.This data is relatively limited because it only has three columns: state, county, and total employees, that we can use in conjunction with each other to analyze. The data contains 2930 rows. The range of employee numbers by county is quite wide so it could be useful to compare ranges and average counts.
If I filter the data by a single state, it makes it easier to look at the number of employees and the county breakdown by that state. (Kentucky example) I can now see there is 119 counties (rows) reported.
# A tibble: 119 × 3
state county total_employees
<chr> <chr> <dbl>
1 KY ADAIR 1
2 KY ALLEN 5
3 KY ANDERSON 5
4 KY BALLARD 7
5 KY BARREN 5
6 KY BATH 3
7 KY BELL 27
8 KY BOONE 236
9 KY BOURBON 8
10 KY BOYD 232
# … with 109 more rows
And here I found the average number of railroad employees in Kentucky.
---
title: "Challenge 1 Abby Balint"
author: "Abby Balint"
desription: "Reading in data and creating a post"
date: "09/15/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- abby_balint
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
```{r}
library(tidyverse)
read_csv("_data/railroad_2012_clean_county.csv")
railroad <- read_csv("_data/railroad_2012_clean_county.csv")
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Description
This data set includes the number of railroad employees in 2012 by both state and county.This data is relatively limited because it only has three columns: state, county, and total employees, that we can use in conjunction with each other to analyze. The data contains 2930 rows. The range of employee numbers by county is quite wide so it could be useful to compare ranges and average counts.
```{r}
#| label: summary
colnames(railroad)
dim(railroad)
```
## Filtering
If I filter the data by a single state, it makes it easier to look at the number of employees and the county breakdown by that state. (Kentucky example) I can now see there is 119 counties (rows) reported.
```{r}
#| label: summary 2
library(dplyr)
filter(railroad, `state` == "KY")
railroadKY <- filter(railroad, `state` == "KY")
```
## Average
And here I found the average number of railroad employees in Kentucky.
```{r}
#| label: summary 3
mean(railroadKY$`total_employees`)
filter(railroadKY, `total_employees` >=200)
```