Practice getting the dataset into RStudio and beginning to understand it.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
The first step is getting everything up and running.
First, what are the dimensions and column names?
dim(birds) # dim() returns dimensions of dataset
[1] 30977 14
colnames(birds) # colnames() returns column names
[1] "Domain Code" "Domain" "Area Code"
[4] "Area" "Element Code" "Element"
[7] "Item Code" "Item" "Year Code"
[10] "Year" "Unit" "Value"
[13] "Flag" "Flag Description"
Okay, and what does this data look like?
head(birds) #head() shows the first few columns of data
# A tibble: 6 x 14
`Domain Code` Domain `Area Code` Area `Element Code` Element
<chr> <chr> <dbl> <chr> <dbl> <chr>
1 QA Live Animals 2 Afgha~ 5112 Stocks
2 QA Live Animals 2 Afgha~ 5112 Stocks
3 QA Live Animals 2 Afgha~ 5112 Stocks
4 QA Live Animals 2 Afgha~ 5112 Stocks
5 QA Live Animals 2 Afgha~ 5112 Stocks
6 QA Live Animals 2 Afgha~ 5112 Stocks
# ... with 8 more variables: Item Code <dbl>, Item <chr>,
# Year Code <dbl>, Year <dbl>, Unit <chr>, Value <dbl>, Flag <chr>,
# Flag Description <chr>
So, overall it doesn’t seem incredibly messy, just very big!
The next step here is to wrangle the data. Since this set is very big, it will be key to understand what we actually need (what columns etc.) to do the analysis we want to do!
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Geeslin (2021, Sept. 26). DACSS 601 Fall 2021: Reading in the Data (Homework 2): Birds. Retrieved from https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-26-geeslin-hw-2-read-in-data/
BibTeX citation
@misc{geeslin2021reading, author = {Geeslin, Eliza}, title = {DACSS 601 Fall 2021: Reading in the Data (Homework 2): Birds}, url = {https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-26-geeslin-hw-2-read-in-data/}, year = {2021} }