DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Reading Railroad Employees Dataset
  • Describing Railroad Data

Challenge 1

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
Reading in data and creating a post
Author

Jerin Jacob

Published

December 12, 2022

Code
library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Reading Railroad Employees Dataset

Code
library(haven)
library(readr)
railroad <- read_csv("_data/railroad_2012_clean_county.csv")

This is a data set of the rail road employees working in 2930 counties of the states in US in the year of 2012. There are 3 variables in the dataset; state, county and total number of employees.

Describing Railroad Data

Code
view(railroad)
railroad%>%
  select(state)%>%
  n_distinct(.)
[1] 53
Code
railroad%>%
  select(state)%>%
  distinct()
# A tibble: 53 × 1
   state
   <chr>
 1 AE   
 2 AK   
 3 AL   
 4 AP   
 5 AR   
 6 AZ   
 7 CA   
 8 CO   
 9 CT   
10 DC   
# … with 43 more rows

There are 53 distinct values in the variable column named state. This means that there are certain additional values other than the name of the states. The variable ‘state’ contains all the states along with armed forces, DC etc. To find what values are included other than the name of the states, the distinct values of the variable ‘state’ is taken.

Source Code
---
title: "Challenge 1"
author: "Jerin Jacob"
description: "Reading in data and creating a post"
date: "12/12/2022"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)
library(readxl)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Reading Railroad Employees Dataset


```{r}
library(haven)
library(readr)
railroad <- read_csv("_data/railroad_2012_clean_county.csv")

```

This is a data set of the rail road employees working in 2930 counties of the states in US in the year of 2012. 
There are 3 variables in the dataset; state, county and total number of employees.


## Describing Railroad Data




```{r}
view(railroad)
railroad%>%
  select(state)%>%
  n_distinct(.)


railroad%>%
  select(state)%>%
  distinct()



```

There are 53 distinct values in the variable column named state. This means that there are certain additional values other than the name of the states. The variable 'state' contains all the states along with armed forces, DC etc. To find what values are included other than the name of the states, the distinct values of the variable 'state' is taken.