Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Guanhua Tan
September 17, 2022
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,920 more rows
state county total_employees
Length:2930 Length:2930 Min. : 1.00
Class :character Class :character 1st Qu.: 7.00
Mode :character Mode :character Median : 21.00
Mean : 87.18
3rd Qu.: 65.00
Max. :8207.00
The dataset of railroad across the United States includes three columns–state, county and total employees in each county – and 2930 counties.Each county at least maintains one employees while the maximum number is 8207. The average of total employees each county hires is 81.18. However, the median of it is 27.00, which reflects that most counties maintain a small station. The numbers of 1st Qu. and 3rd Qu. further confirms the finding.
# A tibble: 10 × 3
state county total_employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
Cook county, IL, hires the largest size of employees and reaches to 8207 while Tarrant, TX, is second to it with 4235 employees.
# A tibble: 10 × 2
state total
<chr> <dbl>
1 TX 19839
2 IL 19131
3 NY 17050
4 NE 13176
5 CA 13137
6 PA 12769
7 OH 9056
8 GA 8605
9 IN 8537
10 MO 8419
Although Cook county hires the largest size of employees, Illinois is not the NO.1 in terms of total employees. Texas outnumbers it by several hundreds employees.
---
title: "Challenge 1"
author: "Guanhua Tan"
desription: "Reading in data and creating a post"
date: "09/17/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
## Railroad
```{r}
railroad <- read_csv("_data/railroad_2012_clean_county.csv")
railroad
```
```{r}
railroad %>%
summary(railroad)
```
The dataset of railroad across the United States includes three columns--state, county and total employees in each county -- and 2930 counties.Each county at least maintains one employees while the maximum number is 8207. The average of total employees each county hires is 81.18. However, the median of it is 27.00, which reflects that most counties maintain a small station. The numbers of 1st Qu. and 3rd Qu. further confirms the finding.
```{r}
# top 10 largest sizes of total employees
max_railroad = railroad %>%
arrange(desc(`total_employees`)) %>%
slice(1:10)
max_railroad
```
Cook county, IL, hires the largest size of employees and reaches to 8207 while Tarrant, TX, is second to it with 4235 employees.
```{r}
# the total employees of each state and arrange them in the descending order
railroad %>% group_by(state) %>% summarise(total=sum(total_employees)) %>% arrange(desc(total)) %>% slice(1:10)
```
Although Cook county hires the largest size of employees, Illinois is not the NO.1 in terms of total employees. Texas outnumbers it by several hundreds employees.