Challenge 1

challenge_1

railroads

faostat

wildbirds

Author

Sai Pranav Kurly

Published

March 1, 2023

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

read in a dataset, and
describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

We can read the railroad_2012_clean_county.csv dataset using the read_csv function. Below also shows the first few rows and the summary of the dataset.

Code

#Read the dataset
dataframe <- read_csv("_data/railroad_2012_clean_county.csv")

#Display the first few rows of the dataset
head(dataframe)

# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Code

#Summarise the dataset
dataframe %>%
  summary(dataframe)

    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00

Describe the data

We see that the dataset is that of the employees in the railroad across the United States.There are 3 columns STATE, COUNTY, and TOTAL EMPLOYEES and a total of 2930 rows.

Code

#Grouping the data by state in descending order of employees
dataframe%>%
  group_by(state)%>%
  summarise(total=sum(total_employees))%>%
  arrange(desc(total)) %>%
slice(1:10)

# A tibble: 10 × 2
   state total
   <chr> <dbl>
 1 TX    19839
 2 IL    19131
 3 NY    17050
 4 NE    13176
 5 CA    13137
 6 PA    12769
 7 OH     9056
 8 GA     8605
 9 IN     8537
10 MO     8419

Above shows the top 10 states where the most employees are present. We see from above that Texas has the highest number of employees.

Code

#Filter state AE from the dataset 
filter(dataframe, state == "TX")

# A tibble: 221 × 3
   state county    total_employees
   <chr> <chr>               <dbl>
 1 TX    ANDERSON              241
 2 TX    ANDREWS                 3
 3 TX    ANGELINA               53
 4 TX    ARANSAS                 6
 5 TX    ARCHER                  8
 6 TX    ARMSTRONG              12
 7 TX    ATASCOSA               64
 8 TX    AUSTIN                 35
 9 TX    BAILEY                  5
10 TX    BANDERA                15
# … with 211 more rows

Above shows us a filtered veiw of TX.

Code

#Grouping the data by state in ascending order of employees
dataframe%>%
  group_by(state)%>%
  summarise(total=sum(total_employees))%>%
  arrange(total) %>%
slice(1:10)

# A tibble: 10 × 2
   state total
   <chr> <dbl>
 1 AP        1
 2 AE        2
 3 HI        4
 4 AK      103
 5 VT      259
 6 DC      279
 7 NH      393
 8 RI      487
 9 ME      654
10 NV      746

We see from above that AP has the lowest number of employees.