Challenge 1 Instructions

challenge_1

railroads

faostat

wildbirds

Author

Tejaswini_Ketineni

Published

August 21, 2022

Code

library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

I am going to work with one data set: 1. railroad_2012_clean_county.csv

Reading the Data

Initially the data set-1(railroad_2012_clean_country) is read

Code

library(readxl)
railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv")
View(railroad_2012_clean_county)

Head function is used to understand the population of data

Code

head(railroad_2012_clean_county)

# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Code

rows_and_columns_ds1 <- dim(railroad_2012_clean_county)
rows_and_columns_ds1

[1] 2930    3

There are about 2930 rows and about 3 columns present in the dataset.

Code

names_col <- colnames(railroad_2012_clean_county)
names_col

[1] "state"           "county"          "total_employees"

There are three columns present in the data set namely : state,county and total_employees present

Code

sum(is.na(railroad_2012_clean_county))

[1] 0

Code

sum(is.null(railroad_2012_clean_county))

[1] 0

There are no nulls or missing values present in the data set

Code

summary(railroad_2012_clean_county)

    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00

Code

library(data.table)
data_railroad <- data.table(railroad_2012_clean_county)
data_railroad[, .(distinct_states = length(unique(state)))]

   distinct_states
1:              53

Code

data_railroad[, .(distinct_county = length(unique(county)))]

   distinct_county
1:            1709

There are 53 distinct states and 1709 distinct counties present.

Code

(table(railroad_2012_clean_county$state))


 AE  AK  AL  AP  AR  AZ  CA  CO  CT  DC  DE  FL  GA  HI  IA  ID  IL  IN  KS  KY 
  1   6  67   1  72  15  55  57   8   1   3  67 152   3  99  36 103  92  95 119 
 LA  MA  MD  ME  MI  MN  MO  MS  MT  NC  ND  NE  NH  NJ  NM  NV  NY  OH  OK  OR 
 63  12  24  16  78  86 115  78  53  94  49  89  10  21  29  12  61  88  73  33 
 PA  RI  SC  SD  TN  TX  UT  VA  VT  WA  WI  WV  WY 
 65   5  46  52  91 221  25  92  14  39  69  53  22

Description of the data

The data set taken is analysed and the following observations are made.There are about 2930 rows and about 3 columns namely (state, county and the total_employees) present in the data set.The data set is checked for null and missing values.We observe that there are no such values present and the data set is clean.The summary statistics are checked.the count of unique states present in the data set is 53 and 1709 unique counties are present.Tabulating the states and the total_employees, we see that the highest number of employees are present in Texas(TX) and Georgia(GA) while the lowest employee count is observed in Armed forces(AE),Armed forces Pacific(AP).We also observe that the no.of states with employee count <10 is very less.