DACSS 601: Data Science Fundamentals - FALL 2022
  • Fall 2022 Posts
  • Contributors
  • DACSS

Challenge 1 Solution Alex Gauvin-Valenta

  • Course information
    • Overview
    • Instructional Team
    • Course Schedule
  • Weekly materials
    • Fall 2022 posts
    • final posts

On this page

  • Challenge Overview
  • Read in the Data
  • Summary

Challenge 1 Solution Alex Gauvin-Valenta

  • Show All Code
  • Hide All Code

  • View Source
challenge_1
railroads
faostat
wildbirds
Author

Alex Gauvin-Valenta

Published

September 26, 2022

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Challenge Overview

Today’s challenge is to

  1. read in a dataset, and

  2. describe the dataset using both words and any supporting information (e.g., tables, etc)

Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

  • railroad_2012_clean_county.csv ⭐
  • birds.csv ⭐⭐
  • FAOstat*.csv ⭐⭐
  • wild_bird_data.xlsx ⭐⭐⭐
  • StateCounty2012.xls ⭐⭐⭐⭐

Find the _data folder, located inside the posts folder. Then you can read in the data, using either one of the readr standard tidy read commands, or a specialized package such as readxl.

Code
railroad<-read_csv("_data/railroad_2012_clean_county.csv")

Summary

The railroads data set looks to be showing the total number of railroad employees per county in each state. Cook, IL has the most employees at 8,207 while Sitka, AK has the least at 1.

Code
summary(railroad)
    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00  
Code
#Data Table
railroad%>%
  arrange(total_employees,desc(total_employees))
# A tibble: 2,930 × 3
   state county   total_employees
   <chr> <chr>              <dbl>
 1 AK    SITKA                  1
 2 AL    BARBOUR                1
 3 AL    HENRY                  1
 4 AP    APO                    1
 5 AR    NEWTON                 1
 6 CA    MONO                   1
 7 CO    BENT                   1
 8 CO    CHEYENNE               1
 9 CO    COSTILLA               1
10 CO    DOLORES                1
# … with 2,920 more rows
Source Code
---
title: "Challenge 1 Solution Alex Gauvin-Valenta"
author: "Alex Gauvin-Valenta"
desription: "Reading in data and creating a post"
date: "9/26/22"
format:
  html:
    toc: true
    code-fold: true
    code-copy: true
    code-tools: true
categories:
  - challenge_1
  - railroads
  - faostat
  - wildbirds
---

```{r}
#| label: setup
#| warning: false
#| message: false

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)
```

## Challenge Overview

Today's challenge is to

1)  read in a dataset, and

2)  describe the dataset using both words and any supporting information (e.g., tables, etc)

## Read in the Data

Read in one (or more) of the following data sets, using the correct R package and command.

-   railroad_2012_clean_county.csv ⭐
-   birds.csv ⭐⭐
-   FAOstat\*.csv ⭐⭐
-   wild_bird_data.xlsx ⭐⭐⭐
-   StateCounty2012.xls ⭐⭐⭐⭐

Find the `_data` folder, located inside the `posts` folder. Then you can read in the data, using either one of the `readr` standard tidy read commands, or a specialized package such as `readxl`.

```{r}
railroad<-read_csv("_data/railroad_2012_clean_county.csv")


```

## Summary

The railroads data set looks to be showing the total number of railroad employees per county in each state. Cook, IL has the most employees at 8,207 while Sitka, AK has the least at 1.


```{r}
#| label: summary
summary(railroad)

#Data Table
railroad%>%
  arrange(total_employees,desc(total_employees))
```