Code
library(tidyverse)
library(readr)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Ishan Bhardwaj
March 14, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# … with 2,920 more rows
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
This dataset has 2930 rows and 3 columns.
[1] "state" "county" "total_employees"
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
The first column (variable) lists states, the second column lists counties in those states, and the third column lists the total number of employees in those counties. Evidently (from the name of the dataset), these employees must be those working in the railroad industry in 2012.
The states in the first column are denoted by their two-letter abbreviations.
state
AE AK AL AP AR AZ
0.0003412969 0.0020477816 0.0228668942 0.0003412969 0.0245733788 0.0051194539
CA CO CT DC DE FL
0.0187713311 0.0194539249 0.0027303754 0.0003412969 0.0010238908 0.0228668942
GA HI IA ID IL IN
0.0518771331 0.0010238908 0.0337883959 0.0122866894 0.0351535836 0.0313993174
KS KY LA MA MD ME
0.0324232082 0.0406143345 0.0215017065 0.0040955631 0.0081911263 0.0054607509
MI MN MO MS MT NC
0.0266211604 0.0293515358 0.0392491468 0.0266211604 0.0180887372 0.0320819113
ND NE NH NJ NM NV
0.0167235495 0.0303754266 0.0034129693 0.0071672355 0.0098976109 0.0040955631
NY OH OK OR PA RI
0.0208191126 0.0300341297 0.0249146758 0.0112627986 0.0221843003 0.0017064846
SC SD TN TX UT VA
0.0156996587 0.0177474403 0.0310580205 0.0754266212 0.0085324232 0.0313993174
VT WA WI WV WY
0.0047781570 0.0133105802 0.0235494881 0.0180887372 0.0075085324
Listed out here is the proportion/percentage of how many times each state in this dataset is mentioned. With this information, we can get an idea of where the railroad industry is more prominent.
---
title: "Challenge 1"
author: "Ishan Bhardwaj"
desription: "Reading in data and creating a post"
date: "03/14/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- Ishan Bhardwaj
- railroads
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
library(readr)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
```{r}
rail <- read_csv("_data/railroad_2012_clean_county.csv")
rail
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
dim(rail)
```
This dataset has 2930 rows and 3 columns.
```{r}
colnames(rail)
head(rail)
```
The first column (variable) lists states, the second column lists counties in those states, and the third column lists the total number of employees in those counties. Evidently (from the name of the dataset), these employees must be those working in the railroad industry in 2012.
```{r}
head(select(rail, 1))
```
The states in the first column are denoted by their two-letter abbreviations.
```{r}
state_table <- table(select(rail, 1))
prop.table(state_table)
```
Listed out here is the proportion/percentage of how many times each state in this dataset is mentioned. With this information, we can get an idea of where the railroad industry is more prominent.