Code
library(tidyverse)
::opts_chunk$set(echo = TRUE) knitr
Xinyang Mao
February 22, 2023
Rows: 2930 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, county
dbl (1): total_employees
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This function defaults to printing the first 6 row.
We can see there are 2930 rows and 3 colums in this dataset.
The structure of the data set also tells us the number of rowsand columns, but it provides even more information. It tells us the column names, the class of each column (what kind of data is stored in it), and the first few observations of each variable.
spc_tbl_ [2,930 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ state : chr [1:2930] "AE" "AK" "AK" "AK" ...
$ county : chr [1:2930] "APO" "ANCHORAGE" "FAIRBANKS NORTH STAR" "JUNEAU" ...
$ total_employees: num [1:2930] 2 7 2 3 2 1 88 102 143 1 ...
- attr(*, "spec")=
.. cols(
.. state = col_character(),
.. county = col_character(),
.. total_employees = col_double()
.. )
- attr(*, "problems")=<externalptr>
This function prints a vector of the column names.
The summary provides descriptive statistics including the min, max, mean, median, and quartiles of each column. For example, we can see in this data set that the average number of total employees is 87.18.
This window provides vertical and horizontal scroll bars to browse the entire data set. It looks exactly like an Excel spreadsheet.
---
title: "Challenge1"
author: "Xinyang Mao"
desription: "Reading in data and describing"
date: "02/22/2023"
format:
html:
toc: true
code-fold: true
code-copy: true
code-tools: true
categories:
- hw1
- challenge1
- my name
- dataset
- ggplot2
---
## Import package
```{r}
#| label: setup
#| warning: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE)
```
## Read dataset
```{r}
data <- read_csv("_data/railroad_2012_clean_county.csv")
```
## Shows the first 6 rows of the data frame
This function defaults to printing the first 6 row.
```{r}
head(data)
```
## Shows the dimensions of the data frame by row and column
We can see there are 2930 rows and 3 colums in this dataset.
```{r}
dim(data)
```
## Shows the structure of the data frame
The structure of the data set also tells us the number of rowsand columns, but it provides even more information. It tells us the column names, the class of each column (what kind of data is stored in it), and the first few observations of each variable.
```{r}
str(data)
```
## Shows the name of each column in the data frame
This function prints a vector of the column names.
```{r}
colnames(data)
```
## Provides summary statistics on the columns of the data frame
The summary provides descriptive statistics including the min, max, mean, median, and quartiles of each column. For example, we can see in this data set that the average number of total employees is 87.18.
```{r}
summary(data)
```
## Shows a spreadsheet-like display of the entire data frame
This window provides vertical and horizontal scroll bars to browse the entire data set. It looks exactly like an Excel spreadsheet.
```{r}
View(data)
```