Code
library(tidyverse)
::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE) knitr
Maanusri Balasubramanian
May 3, 2023
Today’s challenge is to
read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)
Read in one (or more) of the following data sets, using the correct R package and command.
Find the _data
folder, located inside the posts
folder. Then you can read in the data, using either one of the readr
standard tidy read commands, or a specialized package such as readxl
.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
spc_tbl_ [2,930 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ state : chr [1:2930] "AE" "AK" "AK" "AK" ...
$ county : chr [1:2930] "APO" "ANCHORAGE" "FAIRBANKS NORTH STAR" "JUNEAU" ...
$ total_employees: num [1:2930] 2 7 2 3 2 1 88 102 143 1 ...
- attr(*, "spec")=
.. cols(
.. state = col_character(),
.. county = col_character(),
.. total_employees = col_double()
.. )
- attr(*, "problems")=<externalptr>
[1] 2930 3
[1] "state" "county" "total_employees"
From the above commands we can see that “railroad_2012_clean_county.csv” gives us the count of employees working in various counties in each state for the railroad in 2012. There are a total of 2930 entries. Each row gives us information about the number of employees in a county in the state. There are 3 columns, namely: state, county and total_employees.
state county total_employees
Length:2930 Length:2930 Min. : 1.00
Class :character Class :character 1st Qu.: 7.00
Mode :character Mode :character Median : 21.00
Mean : 87.18
3rd Qu.: 65.00
Max. :8207.00
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AK SITKA 1
2 AL BARBOUR 1
3 AL HENRY 1
4 AP APO 1
5 AR NEWTON 1
6 CA MONO 1
7 CO BENT 1
8 CO CHEYENNE 1
9 CO COSTILLA 1
10 CO DOLORES 1
# ℹ 2,920 more rows
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 IL COOK 8207
2 TX TARRANT 4235
3 NE DOUGLAS 3797
4 NY SUFFOLK 3685
5 VA INDEPENDENT CITY 3249
6 FL DUVAL 3073
7 CA SAN BERNARDINO 2888
8 CA LOS ANGELES 2545
9 TX HARRIS 2535
10 NE LINCOLN 2289
# ℹ 2,920 more rows
From the above result we know that the country ‘COOK’ in IL has the highest number of employees: 8207 and 1 is the minimum number of employees in any country (many counties have only 1 employee).
# A tibble: 53 × 2
state state_employees
<chr> <dbl>
1 TX 19839
2 IL 19131
3 NY 17050
4 NE 13176
5 CA 13137
6 PA 12769
7 OH 9056
8 GA 8605
9 IN 8537
10 MO 8419
# ℹ 43 more rows
[1] 53 2
From the above results we know that TX has the highest number of rail road employees: 19839 and AP has the least number of employees: 1.
And from the dimensions of grouped_rr_state, we know that there are 53 unique states in which rail road employees work.
---
title: "Challenge 1"
author: "Maanusri Balasubramanian"
description: "Reading in data and creating a post"
date: "05/03/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- challenge_1
- maanusri balasubramanian
- railroads
---
```{r}
#| label: setup
#| warning: false
#| message: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```
## Challenge Overview
Today's challenge is to
1) read in a dataset, and
2) describe the dataset using both words and any supporting information (e.g., tables, etc)
## Read in the Data
Read in one (or more) of the following data sets, using the correct R package and command.
- railroad_2012_clean_county.csv ⭐
- birds.csv ⭐⭐
- FAOstat\*.csv ⭐⭐
- wild_bird_data.xlsx ⭐⭐⭐
- StateCounty2012.xls ⭐⭐⭐⭐
Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.
Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.
```{r}
# loading the data
rr <- read_csv("_data/railroad_2012_clean_county.csv")
# printing first 5 rows of the dataset
head(rr)
```
## Describe the data
Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).
```{r}
#| label: summary
# description of the dataset
str(rr)
# number of rows and columns in the dataset
dim(rr)
# column names
colnames(rr)
```
From the above commands we can see that "railroad_2012_clean_county.csv" gives us the count of employees working in various counties in each state for the railroad in 2012. There are a total of 2930 entries. Each row gives us information about the number of employees in a county in the state. There are 3 columns, namely: state, county and total_employees.
```{r}
# Summarizing the data with summary
summary(rr)
```
```{r}
# Arranging entries wrt total employees
arrange(rr, `total_employees`)
# Arranging entries wrt total employees in the descending order
arrange(rr, desc(`total_employees`))
```
From the above result we know that the country 'COOK' in IL has the highest number of employees: 8207 and 1 is the minimum number of employees in any country (many counties have only 1 employee).
```{r}
# Grouping in terms of state to summarise
grouped_rr_state <- rr%>%
group_by(state)%>%
summarize(state_employees = sum(total_employees))%>%
arrange(desc(`state_employees`))
grouped_rr_state
dim(grouped_rr_state)
```
From the above results we know that TX has the highest number of rail road employees: 19839 and AP has the least number of employees: 1.
And from the dimensions of grouped_rr_state, we know that there are 53 unique states in which rail road employees work.