Homework 2 - Reading and pulling data.
You can read data into R through a couple of methods. The first method is to use the library(datasets) function which will allow you to call dataset from the library. This is often seen using the dataset iris in examples. If you call table(iris) it will load most of the iris dataset.
library(datasets)
table(iris)
Reading your own dataset into R is a different process. In order to do this you must pull it from your working directory. To find your this directory you can use the getwd() function. If you’re confused about what a working directory is, it is a directory is your computer file folders (i.e. Downloads, Documents, etc.), and when R is running it’s working in one of these folders. Hence the name working directory.
library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel("/_data/StateCounty2012.xls")
View(StateCounty2012)
You can also use HERE to make links easier. you can read more about HERE here: https://github.com/jennybc/here_here# One reason to use HERE is it allows you to bypass the issue of setwd(), allowing you to change your working directory file, which can cause issues! A relative path to the project root directory will always be created using here().
library(here) library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
View(StateCounty2012)’
Using head() you can preview the data. You will notice that it’s not very tidy.
library(here)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
head(StateCounty2012)
# A tibble: 6 x 6
`TOTAL RAILROAD EMPLOYMENT BY STA~ ...2 ...3 ...4 ...5 ...6
<chr> <chr> <lgl> <chr> <lgl> <chr>
1 CALENDAR YEAR 2012 <NA> NA <NA> NA <NA>
2 <NA> <NA> NA <NA> NA <NA>
3 <NA> STATE NA COUNTY NA TOTAL
4 <NA> AE NA APO NA 2
5 <NA> AE Tot~ NA <NA> NA 2
6 <NA> AK NA ANCHOR~ NA 7
Here is an example of tidy data.
library(here)
library(readr)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
head(Eggs)
# A tibble: 6 x 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
# ... with 1 more variable: extra_large_dozen <dbl>
You can also use tibble directly, which is part of tidyverse, to create a table for your data.
library(tidyverse)
library(here)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
as_tibble(Eggs)
# A tibble: 120 x 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ... with 110 more rows, and 1 more variable:
# extra_large_dozen <dbl>
If you would like to show the full data table you can using kable. Below you will see the kable version for StateCounty2012 and Eggs.
kable(Eggs, caption = “Here is the tidy data of Eggs”)
kable(StateCounty2012, caption = “Here is the untidy data of StateCounty2012”)
month | year | large_half_dozen | large_dozen | extra_large_half_dozen | extra_large_dozen |
---|---|---|---|---|---|
January | 2004 | 126.0 | 230.00 | 132.0 | 230.0 |
February | 2004 | 128.5 | 226.25 | 134.5 | 230.0 |
March | 2004 | 131.0 | 225.00 | 137.0 | 230.0 |
April | 2004 | 131.0 | 225.00 | 137.0 | 234.5 |
TOTAL RAILROAD EMPLOYMENT BY STATE AND COUNTY | …2 | …3 | …4 | …5 | …6 |
---|---|---|---|---|---|
CALENDAR YEAR 2012 | NA | NA | NA | NA | NA |
NA | NA | NA | NA | NA | NA |
NA | STATE | NA | COUNTY | NA | TOTAL |
NA | AE | NA | APO | NA | 2 |
You can use rmarkdown for paged tables.
library(rmarkdown)
paged_table(Eggs)<br. paged_table(StateCounty2012)
To edit data you can install editData (install.packages(“editData”)). You can read more about it here: https://cran.r-project.org/web/packages/editData/vignettes/editData.html
require(editData)
tibble(StateCounty2012)
result <- editData(StateCounty2012)
Distill is a publication format for scientific and technical writing, native to the web.
Learn more about using Distill at https://rstudio.github.io/distill.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Hackbarth (2021, Sept. 16). DACSS 601 Fall 2021: Homework 2. Retrieved from https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-16-homework-2-molly-hackbarth/
BibTeX citation
@misc{hackbarth2021homework, author = {Hackbarth, Molly}, title = {DACSS 601 Fall 2021: Homework 2}, url = {https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-16-homework-2-molly-hackbarth/}, year = {2021} }