DACSS 601 Fall 2021: Homework 2

Molly Hackbarth

Reading Data into R

You can read data into R through a couple of methods. The first method is to use the library(datasets) function which will allow you to call dataset from the library. This is often seen using the dataset iris in examples. If you call table(iris) it will load most of the iris dataset.

library(datasets)
table(iris)

Reading Data into R through your own datasets

Reading your own dataset into R is a different process. In order to do this you must pull it from your working directory. To find your this directory you can use the getwd() function. If you’re confused about what a working directory is, it is a directory is your computer file folders (i.e. Downloads, Documents, etc.), and when R is running it’s working in one of these folders. Hence the name working directory.

library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel("/_data/StateCounty2012.xls")
View(StateCounty2012)

You can also use HERE

You can also use HERE to make links easier. you can read more about HERE here: https://github.com/jennybc/here_here# One reason to use HERE is it allows you to bypass the issue of setwd(), allowing you to change your working directory file, which can cause issues! A relative path to the project root directory will always be created using here().

library(here) library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
View(StateCounty2012)’

Notes

If you are having trouble reading a file make sure to check knit and then go to the knit directory and check Project Directory!
if you are having trouble viewing the worksheet when you are knitting it on a mac you may need to download xquartz found here: https://www.xquartz.org/
To render a file you may need to use ../_data/StateCounty2012.xls however when you submit it to github you’ll want to remove the periods to show it as /_data/StateCounty2012.xls so that will work on the instructor’s computer.
If you have the more subfolders to get to your file in HERE, you would add them before the worksheet/image/project. (i.e. here(“images”, “best-dogs”, “goodestboy.jpg”))
A great rundown on how HERE works is here: http://jenrichmond.rbind.io/post/how-to-use-the-here-package/

Preview the data

An example of untidy data

Using head() you can preview the data. You will notice that it’s not very tidy.

library(here)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
head(StateCounty2012)

# A tibble: 6 x 6
  `TOTAL RAILROAD EMPLOYMENT BY STA~ ...2    ...3  ...4    ...5  ...6 
  <chr>                              <chr>   <lgl> <chr>   <lgl> <chr>
1 CALENDAR YEAR 2012                 <NA>    NA    <NA>    NA    <NA> 
2 <NA>                               <NA>    NA    <NA>    NA    <NA> 
3 <NA>                               STATE   NA    COUNTY  NA    TOTAL
4 <NA>                               AE      NA    APO     NA    2    
5 <NA>                               AE Tot~ NA    <NA>    NA    2    
6 <NA>                               AK      NA    ANCHOR~ NA    7

An example of tidy data

Here is an example of tidy data.

library(here)
library(readr)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
head(Eggs)

# A tibble: 6 x 6
  month     year large_half_dozen large_dozen extra_large_half_dozen
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>
1 January   2004             126         230                    132 
2 February  2004             128.        226.                   134.
3 March     2004             131         225                    137 
4 April     2004             131         225                    137 
5 May       2004             131         225                    137 
6 June      2004             134.        231.                   137 
# ... with 1 more variable: extra_large_dozen <dbl>

Tibble

You can also use tibble directly, which is part of tidyverse, to create a table for your data.

library(tidyverse)
library(here)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
as_tibble(Eggs)

# A tibble: 120 x 6
   month      year large_half_dozen large_dozen extra_large_half_dozen
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
 1 January    2004             126         230                    132 
 2 February   2004             128.        226.                   134.
 3 March      2004             131         225                    137 
 4 April      2004             131         225                    137 
 5 May        2004             131         225                    137 
 6 June       2004             134.        231.                   137 
 7 July       2004             134.        234.                   137 
 8 August     2004             134.        234.                   137 
 9 September  2004             130.        234.                   136.
10 October    2004             128.        234.                   136.
# ... with 110 more rows, and 1 more variable:
#   extra_large_dozen <dbl>

Using Kable

If you would like to show the full data table you can using kable. Below you will see the kable version for StateCounty2012 and Eggs.

kable(Eggs, caption = “Here is the tidy data of Eggs”)
kable(StateCounty2012, caption = “Here is the untidy data of StateCounty2012”)

Table 1: Here is the tidy data of Eggs
month	year	large_half_dozen	large_dozen	extra_large_half_dozen	extra_large_dozen
January	2004	126.0	230.00	132.0	230.0
February	2004	128.5	226.25	134.5	230.0
March	2004	131.0	225.00	137.0	230.0
April	2004	131.0	225.00	137.0	234.5

Table 1: Here is the untidy data of StateCounty2012
TOTAL RAILROAD EMPLOYMENT BY STATE AND COUNTY	…2	…3	…4	…5	…6
CALENDAR YEAR 2012	NA	NA	NA	NA	NA
NA	NA	NA	NA	NA	NA
NA	STATE	NA	COUNTY	NA	TOTAL
NA	AE	NA	APO	NA	2

Using rmarkdown

You can use rmarkdown for paged tables.

library(rmarkdown)
paged_table(Eggs)<br. paged_table(StateCounty2012)

Editing a file

To edit data you can install editData (install.packages(“editData”)). You can read more about it here: https://cran.r-project.org/web/packages/editData/vignettes/editData.html

require(editData)
tibble(StateCounty2012)
result <- editData(StateCounty2012)

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.

Comment on this article Share:

Homework 2