challenge_1
Ishan Bhardwaj
railroads
Author

Ishan Bhardwaj

Published

March 14, 2023

Code
library(tidyverse)
library(readr)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
rail <- read_csv("_data/railroad_2012_clean_county.csv")
rail
# A tibble: 2,930 × 3
   state county               total_employees
   <chr> <chr>                          <dbl>
 1 AE    APO                                2
 2 AK    ANCHORAGE                          7
 3 AK    FAIRBANKS NORTH STAR               2
 4 AK    JUNEAU                             3
 5 AK    MATANUSKA-SUSITNA                  2
 6 AK    SITKA                              1
 7 AK    SKAGWAY MUNICIPALITY              88
 8 AL    AUTAUGA                          102
 9 AL    BALDWIN                          143
10 AL    BARBOUR                            1
# … with 2,920 more rows

Add any comments or documentation as needed. More challenging data sets may require additional code chunks and documentation.

Describe the data

Using a combination of words and results of R commands, can you provide a high level description of the data? Describe as efficiently as possible where/how the data was (likely) gathered, indicate the cases and variables (both the interpretation and any details you deem useful to the reader to fully understand your chosen data).

Code
dim(rail)
[1] 2930    3

This dataset has 2930 rows and 3 columns.

Code
colnames(rail)
[1] "state"           "county"          "total_employees"
Code
head(rail)
# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

The first column (variable) lists states, the second column lists counties in those states, and the third column lists the total number of employees in those counties. Evidently (from the name of the dataset), these employees must be those working in the railroad industry in 2012.

Code
head(select(rail, 1))
# A tibble: 6 × 1
  state
  <chr>
1 AE   
2 AK   
3 AK   
4 AK   
5 AK   
6 AK   

The states in the first column are denoted by their two-letter abbreviations.

Code
state_table <- table(select(rail, 1))
prop.table(state_table)
state
          AE           AK           AL           AP           AR           AZ 
0.0003412969 0.0020477816 0.0228668942 0.0003412969 0.0245733788 0.0051194539 
          CA           CO           CT           DC           DE           FL 
0.0187713311 0.0194539249 0.0027303754 0.0003412969 0.0010238908 0.0228668942 
          GA           HI           IA           ID           IL           IN 
0.0518771331 0.0010238908 0.0337883959 0.0122866894 0.0351535836 0.0313993174 
          KS           KY           LA           MA           MD           ME 
0.0324232082 0.0406143345 0.0215017065 0.0040955631 0.0081911263 0.0054607509 
          MI           MN           MO           MS           MT           NC 
0.0266211604 0.0293515358 0.0392491468 0.0266211604 0.0180887372 0.0320819113 
          ND           NE           NH           NJ           NM           NV 
0.0167235495 0.0303754266 0.0034129693 0.0071672355 0.0098976109 0.0040955631 
          NY           OH           OK           OR           PA           RI 
0.0208191126 0.0300341297 0.0249146758 0.0112627986 0.0221843003 0.0017064846 
          SC           SD           TN           TX           UT           VA 
0.0156996587 0.0177474403 0.0310580205 0.0754266212 0.0085324232 0.0313993174 
          VT           WA           WI           WV           WY 
0.0047781570 0.0133105802 0.0235494881 0.0180887372 0.0075085324 

Listed out here is the proportion/percentage of how many times each state in this dataset is mentioned. With this information, we can get an idea of where the railroad industry is more prominent.