Data Analytics and Computational Social Science: Roy Yoon HW #2

Roy Yoon

Packages Used

Packages installed for Homework Assignment Number Two:

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readr)
library(readxl)
library(dplyr)
library(ggplot2)

Dataset Used

I used clean data set “railroad_2012_clean_county”.

The data is already in a tidy data format.

Tidy Data:

Each variable has it’s own column
Each observation has it’s own row
Each value has it’s own cell

railroad_2012 <- read_excel("railroad_2012_clean_county.xlsx")
head(railroad_2012)

# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Dataset Dimension

These are the dimensions of the data set railroad_2012:

dim(railroad_2012)

[1] 2930    3

There are 2930 rows and 3 columns

Column Names

Railroad_2012 has three variables(columns) state, county, and total_employees

colnames(railroad_2012)

[1] "state"           "county"          "total_employees"

Filter

The following is a filter of showing how many counties have more than 100 employees for the railroad in 2012:

OneHundred_More <- filter(railroad_2012,total_employees > 100 )

OneHundred_More

# A tibble: 530 × 3
   state county     total_employees
   <chr> <chr>                <dbl>
 1 AL    AUTAUGA                102
 2 AL    BALDWIN                143
 3 AL    BLOUNT                 154
 4 AL    COLBERT                199
 5 AL    CULLMAN                129
 6 AL    DALLAS                 122
 7 AL    ELMORE                 116
 8 AL    JEFFERSON              990
 9 AL    LAUDERDALE             117
10 AL    MOBILE                 331
# … with 520 more rows

100 * (530/2930)

[1] 18.08874

There are 530 out of 2930 (18.09%) Counties that have more than 100 employees for the railroad in 2012.

Conclusion

Dataset name: railroad_2012_clean_county
Dimensions: 2930 rows and 3 columns
Column Names: state, county, total_employees
Counties with More than 100 employees: 18.09%, 530 out of 2930

Comment on this article Share:

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Yoon (2022, Feb. 23). Data Analytics and Computational Social Science: Roy Yoon HW #2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomry0531869698/

BibTeX citation

@misc{yoon2022roy,
  author = {Yoon, Roy},
  title = {Data Analytics and Computational Social Science: Roy Yoon HW #2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomry0531869698/},
  year = {2022}
}