hw1
challenge1
my name
dataset
ggplot2
Author

Xinyang Mao

Published

February 22, 2023

Import package

Code
library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Read dataset

Code
data <- read_csv("_data/railroad_2012_clean_county.csv")
Rows: 2930 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, county
dbl (1): total_employees

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Shows the first 6 rows of the data frame

This function defaults to printing the first 6 row.

Code
head(data)
# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Shows the dimensions of the data frame by row and column

We can see there are 2930 rows and 3 colums in this dataset.

Code
dim(data)
[1] 2930    3

Shows the structure of the data frame

The structure of the data set also tells us the number of rowsand columns, but it provides even more information. It tells us the column names, the class of each column (what kind of data is stored in it), and the first few observations of each variable.

Code
str(data)
spc_tbl_ [2,930 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ state          : chr [1:2930] "AE" "AK" "AK" "AK" ...
 $ county         : chr [1:2930] "APO" "ANCHORAGE" "FAIRBANKS NORTH STAR" "JUNEAU" ...
 $ total_employees: num [1:2930] 2 7 2 3 2 1 88 102 143 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   state = col_character(),
  ..   county = col_character(),
  ..   total_employees = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Shows the name of each column in the data frame

This function prints a vector of the column names.

Code
colnames(data)
[1] "state"           "county"          "total_employees"

Provides summary statistics on the columns of the data frame

The summary provides descriptive statistics including the min, max, mean, median, and quartiles of each column. For example, we can see in this data set that the average number of total employees is 87.18.

Code
summary(data)
    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00  

Shows a spreadsheet-like display of the entire data frame

This window provides vertical and horizontal scroll bars to browse the entire data set. It looks exactly like an Excel spreadsheet.

Code
View(data)