DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Overview
  • Code to read data
  • Summary

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Reading in data and creating a post
Author

Hezzie Phillips

Published

September 21, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Overview

Below please find my overview of the railroad dataset from 2012:

  • railroad_2012_clean_county.csv ⭐

Code to read data

Code
railroadclean<-read_csv("_data/railroad_2012_clean_county.csv")

Summary

We see that there are three variables in this table: state, county and total employees.

The data seems to be tabulating the number of railroad employees in each county.

We can see that the largest employment number in a county is 8207 (in Cook County Illinois) and the smallest is one (which happens in 145 counties). With such a wide range it is useful, perhaps, to look at the median which is 21 people.

Code
summary(railroadclean)
    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00  
Code
distinct(railroadclean)
# A tibble: 2,930 × 3
   state county               total_employees
   <chr> <chr>                          <dbl>
 1 AE    APO                                2
 2 AK    ANCHORAGE                          7
 3 AK    FAIRBANKS NORTH STAR               2
 4 AK    JUNEAU                             3
 5 AK    MATANUSKA-SUSITNA                  2
 6 AK    SITKA                              1
 7 AK    SKAGWAY MUNICIPALITY              88
 8 AL    AUTAUGA                          102
 9 AL    BALDWIN                          143
10 AL    BARBOUR                            1
# … with 2,920 more rows
Source Code
---
title: "Challenge 1"
author: "Hezzie Phillips"
description: "Reading in data and creating a post"
date: "09/21/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Overview

Below please find my overview of the railroad dataset from 2012:  

-   railroad_2012_clean_county.csv ⭐

## Code to read data
```{r}
railroadclean<-read_csv("_data/railroad_2012_clean_county.csv")

```


## Summary

We see that there are three variables in this table: state, county and total employees.  

The data seems to be tabulating the number of railroad employees in each county.  

We can see that the largest employment number in a county is 8207 (in Cook County Illinois) and the smallest is one (which happens in 145 counties). With such a wide range it is useful, perhaps, to look at the median which is 21 people.

```{r}
#| label: summary
summary(railroadclean)
distinct(railroadclean)

```