DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Describe the data

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Michaela Bowen

Published

October 10, 2022

Code
library(tidyverse)
library(summarytools)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐

Here is read in the csv railroad data

Code
library(readr)
railroads <- read_csv("/Users/micha/OneDrive/Documents/Graduate School/DACSS 601/601_Fall_2022/posts/_data/railroad_2012_clean_county.csv")

Describe the data

Below you can see the variables are the county’s and states, and each case is the number of employees at each railroad station. This data was likely collected at each station and submitted to a state or federal database. It is likely this information was gathered by station, given the data is grouped by state and then by county.

Below I have outlined the distinct “States” which we can see there are 53 of, and that include District of Columbia and some Canadian railroads

Code
n_distinct(railroads$state)
[1] 53
Code
unique(railroads$state)
 [1] "AE" "AK" "AL" "AP" "AR" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "HI" "IA"
[16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC"
[31] "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN"
[46] "TX" "UT" "VA" "VT" "WA" "WI" "WV" "WY"
Source Code
---
title: "Challenge 1"
author: "Michaela Bowen"
desription: "Reading in data and creating a post"
date: "10/10/22"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(summarytools)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

-   railroad_2012_clean_county.csv ⭐


Here is read in the csv railroad data

```{r}
library(readr)
railroads <- read_csv("/Users/micha/OneDrive/Documents/Graduate School/DACSS 601/601_Fall_2022/posts/_data/railroad_2012_clean_county.csv")
```



## Describe the data


Below you can see the variables are the county's and states, and each case is the number of employees at each railroad station. This data was likely collected at each station and submitted to a state or federal database. It is likely this information was gathered by station, given the data is grouped by state and then by county.  

Below I have outlined the distinct "States" which we can see there are 53 of, and that include District of Columbia and some Canadian railroads

```{r}
#| label: summary
n_distinct(railroads$state)
unique(railroads$state)
  
```