Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Sai Pranav Kurly
March 1, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
We can read the railroad_2012_clean_county.csv dataset using the read_csv function. Below also shows the first few rows and the summary of the dataset.
# A tibble: 6 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
state county total_employees
Length:2930 Length:2930 Min. : 1.00
Class :character Class :character 1st Qu.: 7.00
Mode :character Mode :character Median : 21.00
Mean : 87.18
3rd Qu.: 65.00
Max. :8207.00
We see that the dataset is that of the employees in the railroad across the United States.There are 3 columns STATE, COUNTY, and TOTAL EMPLOYEES and a total of 2930 rows.
# A tibble: 10 × 2
state total
<chr> <dbl>
1 TX 19839
2 IL 19131
3 NY 17050
4 NE 13176
5 CA 13137
6 PA 12769
7 OH 9056
8 GA 8605
9 IN 8537
10 MO 8419
Above shows the top 10 states where the most employees are present. We see from above that Texas has the highest number of employees.
# A tibble: 221 × 3
state county total_employees
<chr> <chr> <dbl>
1 TX ANDERSON 241
2 TX ANDREWS 3
3 TX ANGELINA 53
4 TX ARANSAS 6
5 TX ARCHER 8
6 TX ARMSTRONG 12
7 TX ATASCOSA 64
8 TX AUSTIN 35
9 TX BAILEY 5
10 TX BANDERA 15
# … with 211 more rows
Above shows us a filtered veiw of TX.
# A tibble: 10 × 2
state total
<chr> <dbl>
1 AP 1
2 AE 2
3 HI 4
4 AK 103
5 VT 259
6 DC 279
7 NH 393
8 RI 487
9 ME 654
10 NV 746
We see from above that AP has the lowest number of employees.
---
title: "Challenge 1"
author: "Sai Pranav Kurly"
desription: "Reading in data and creating a post"
date: "03/01/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- railroads
- faostat
- wildbirds
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
We can read the railroad_2012_clean_county.csv dataset using the read_csv function. Below also shows the first few rows and the summary of the dataset.
```{r}
#Read the dataset
dataframe <- read_csv("_data/railroad_2012_clean_county.csv")
#Display the first few rows of the dataset
head(dataframe)
#Summarise the dataset
dataframe %>%
summary(dataframe)
```
## Describe the data
We see that the dataset is that of the employees in the railroad across the United States.There are 3 columns STATE, COUNTY, and TOTAL EMPLOYEES and a total of 2930 rows.
```{r}
#Grouping the data by state in descending order of employees
dataframe%>%
group_by(state)%>%
summarise(total=sum(total_employees))%>%
arrange(desc(total)) %>%
slice(1:10)
```
Above shows the top 10 states where the most employees are present. We see from above that Texas has the highest number of employees.
```{r}
#Filter state AE from the dataset
filter(dataframe, state == "TX")
```
Above shows us a filtered veiw of TX.
```{r}
#Grouping the data by state in ascending order of employees
dataframe%>%
group_by(state)%>%
summarise(total=sum(total_employees))%>%
arrange(total) %>%
slice(1:10)
```
We see from above that AP has the lowest number of employees.