Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Pavan Datta Abbineni
August 15, 2022
I’ve decided to use the railroad_2012_clean_county.csv
dataset
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 WY SHERIDAN 252
2 WY SUBLETTE 3
3 WY SWEETWATER 196
4 WY UINTA 49
5 WY WASHAKIE 10
6 WY WESTON 37
For a dataset to be in tidy-format
it needs to satisfy the following conditions.
1) Each variable has its own column 2) Each value is in its own cell and 3) Each observation is located in its own row.
From our visualization of our above dataset we can confidently say that our dataset is already in tidy format.
Our dataset has a total of 2930 rows.
[1] 3
[1] "state" "county" "total_employees"
We have a total of 3 columns with the names being state
, county
and total_employees
.
[1] "AE" "AK" "AL" "AP" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC"
[31] "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN"
[46] "TX" "UT" "VA" "VT" "WA" "WI" "WV" "WY"
[1] 53
We can see that there are 53 unique states data in our dataset.
AE AP DC DE HI RI AK CT NH MA NV VT AZ ME NJ WY MD UT NM OR
1 1 1 3 3 5 6 8 10 12 12 14 15 16 21 22 24 25 29 33
ID WA SC ND SD MT WV CA CO NY LA PA AL FL WI AR OK MI MS MN
36 39 46 49 52 53 53 55 57 61 63 65 67 67 69 72 73 78 78 86
OH NE TN IN VA NC KS IA IL MO KY GA TX
88 89 91 92 92 94 95 99 103 115 119 152 221
We can see that Texas and Georgia are the states with highest employees where as there are quite a few states with fewer than 10 employees.
This data is likely gathered from the official railroad website, as the number of employees currently on payroll is known data to them.
---
title: "Challenge 1"
author: "Pavan Datta Abbineni "
desription: "Reading in data and creating a post"
date: "08/15/2022"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
I've decided to use the `railroad_2012_clean_county.csv` dataset
```{r}
railroadCompleteData<- read_csv("_data/railroad_2012_clean_county.csv")
```
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
## Describe the data
```{r head-visualisation}
head(railroadCompleteData)
```
```{r tail-visualisation}
tail(railroadCompleteData)
```
For a dataset to be in `tidy-format` it needs to satisfy the following conditions.
1) Each variable has its own column
2) Each value is in its own cell and
3) Each observation is located in its own row.
From our visualization of our above dataset we can confidently say that our dataset is already in tidy format.
```{r rows-dimension}
nrow(railroadCompleteData)
```
Our dataset has a total of 2930 rows.
```{r cols-dimension}
ncol(railroadCompleteData)
colnames(railroadCompleteData)
```
We have a total of 3 columns with the names being `state`, `county` and `total_employees`.
```{r}
stateNames = railroadCompleteData$state
countyNames = railroadCompleteData$county
(unique(stateNames))
length(unique(stateNames))
```
We can see that there are 53 unique states data in our dataset.
```{r sorteddata}
tableOfCompleteData =(table(railroadCompleteData$state))
tableOfCompleteData[order(tableOfCompleteData)]
```
We can see that Texas and Georgia are the states with highest employees where as there are quite a few states with fewer than 10 employees.
This data is likely gathered from the official railroad website, as the number of employees currently on payroll is known data to them.