Challenge1

hw1

challenge1

my name

dataset

ggplot2

Author

Xinyang Mao

Published

February 22, 2023

Import package

Code

library(tidyverse)

knitr::opts_chunk$set(echo = TRUE)

Read dataset

Code

data <- read_csv("_data/railroad_2012_clean_county.csv")

Rows: 2930 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, county
dbl (1): total_employees

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Shows the first 6 rows of the data frame

This function defaults to printing the first 6 row.

Code

head(data)

# A tibble: 6 × 3
  state county               total_employees
  <chr> <chr>                          <dbl>
1 AE    APO                                2
2 AK    ANCHORAGE                          7
3 AK    FAIRBANKS NORTH STAR               2
4 AK    JUNEAU                             3
5 AK    MATANUSKA-SUSITNA                  2
6 AK    SITKA                              1

Shows the dimensions of the data frame by row and column

We can see there are 2930 rows and 3 colums in this dataset.

Code

dim(data)

[1] 2930    3

Shows the structure of the data frame

The structure of the data set also tells us the number of rowsand columns, but it provides even more information. It tells us the column names, the class of each column (what kind of data is stored in it), and the first few observations of each variable.

Code

str(data)

spc_tbl_ [2,930 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ state          : chr [1:2930] "AE" "AK" "AK" "AK" ...
 $ county         : chr [1:2930] "APO" "ANCHORAGE" "FAIRBANKS NORTH STAR" "JUNEAU" ...
 $ total_employees: num [1:2930] 2 7 2 3 2 1 88 102 143 1 ...
 - attr(*, "spec")=
  .. cols(
  ..   state = col_character(),
  ..   county = col_character(),
  ..   total_employees = col_double()
  .. )
 - attr(*, "problems")=<externalptr>

Shows the name of each column in the data frame

This function prints a vector of the column names.

Code

colnames(data)

[1] "state"           "county"          "total_employees"

Provides summary statistics on the columns of the data frame

The summary provides descriptive statistics including the min, max, mean, median, and quartiles of each column. For example, we can see in this data set that the average number of total employees is 87.18.

Code

summary(data)

    state              county          total_employees  
 Length:2930        Length:2930        Min.   :   1.00  
 Class :character   Class :character   1st Qu.:   7.00  
 Mode  :character   Mode  :character   Median :  21.00  
                                       Mean   :  87.18  
                                       3rd Qu.:  65.00  
                                       Max.   :8207.00

Shows a spreadsheet-like display of the entire data frame

This window provides vertical and horizontal scroll bars to browse the entire data set. It looks exactly like an Excel spreadsheet.

Code

View(data)