Homework 2

Homework 2 - Reading and pulling data.

Molly Hackbarth
09-16-2021

Reading Data into R

You can read data into R through a couple of methods. The first method is to use the library(datasets) function which will allow you to call dataset from the library. This is often seen using the dataset iris in examples. If you call table(iris) it will load most of the iris dataset.

library(datasets)
table(iris)

Reading Data into R through your own datasets

Reading your own dataset into R is a different process. In order to do this you must pull it from your working directory. To find your this directory you can use the getwd() function. If you’re confused about what a working directory is, it is a directory is your computer file folders (i.e. Downloads, Documents, etc.), and when R is running it’s working in one of these folders. Hence the name working directory.

library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel("/_data/StateCounty2012.xls")
View(StateCounty2012)

You can also use HERE

You can also use HERE to make links easier. you can read more about HERE here: https://github.com/jennybc/here_here# One reason to use HERE is it allows you to bypass the issue of setwd(), allowing you to change your working directory file, which can cause issues! A relative path to the project root directory will always be created using here().

library(here) library(tidyverse)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
View(StateCounty2012)’

Notes

Preview the data

An example of untidy data

Using head() you can preview the data. You will notice that it’s not very tidy.

library(here)
library(readxl)
StateCounty2012 <- read_excel(here("_data“,”StateCounty2012.xls"))
head(StateCounty2012)

# A tibble: 6 x 6
  `TOTAL RAILROAD EMPLOYMENT BY STA~ ...2    ...3  ...4    ...5  ...6 
  <chr>                              <chr>   <lgl> <chr>   <lgl> <chr>
1 CALENDAR YEAR 2012                 <NA>    NA    <NA>    NA    <NA> 
2 <NA>                               <NA>    NA    <NA>    NA    <NA> 
3 <NA>                               STATE   NA    COUNTY  NA    TOTAL
4 <NA>                               AE      NA    APO     NA    2    
5 <NA>                               AE Tot~ NA    <NA>    NA    2    
6 <NA>                               AK      NA    ANCHOR~ NA    7    

An example of tidy data

Here is an example of tidy data.

library(here)
library(readr)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
head(Eggs)

# A tibble: 6 x 6
  month     year large_half_dozen large_dozen extra_large_half_dozen
  <chr>    <dbl>            <dbl>       <dbl>                  <dbl>
1 January   2004             126         230                    132 
2 February  2004             128.        226.                   134.
3 March     2004             131         225                    137 
4 April     2004             131         225                    137 
5 May       2004             131         225                    137 
6 June      2004             134.        231.                   137 
# ... with 1 more variable: extra_large_dozen <dbl>

Tibble

You can also use tibble directly, which is part of tidyverse, to create a table for your data.

library(tidyverse)
library(here)
Eggs <- read_csv(here("_data“,”eggs_tidy.csv"))
as_tibble(Eggs)

# A tibble: 120 x 6
   month      year large_half_dozen large_dozen extra_large_half_dozen
   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
 1 January    2004             126         230                    132 
 2 February   2004             128.        226.                   134.
 3 March      2004             131         225                    137 
 4 April      2004             131         225                    137 
 5 May        2004             131         225                    137 
 6 June       2004             134.        231.                   137 
 7 July       2004             134.        234.                   137 
 8 August     2004             134.        234.                   137 
 9 September  2004             130.        234.                   136.
10 October    2004             128.        234.                   136.
# ... with 110 more rows, and 1 more variable:
#   extra_large_dozen <dbl>

Using Kable

If you would like to show the full data table you can using kable. Below you will see the kable version for StateCounty2012 and Eggs.

kable(Eggs, caption = “Here is the tidy data of Eggs”)
kable(StateCounty2012, caption = “Here is the untidy data of StateCounty2012”)

Table 1: Here is the tidy data of Eggs
month year large_half_dozen large_dozen extra_large_half_dozen extra_large_dozen
January 2004 126.0 230.00 132.0 230.0
February 2004 128.5 226.25 134.5 230.0
March 2004 131.0 225.00 137.0 230.0
April 2004 131.0 225.00 137.0 234.5
Table 1: Here is the untidy data of StateCounty2012
TOTAL RAILROAD EMPLOYMENT BY STATE AND COUNTY …2 …3 …4 …5 …6
CALENDAR YEAR 2012 NA NA NA NA NA
NA NA NA NA NA NA
NA STATE NA COUNTY NA TOTAL
NA AE NA APO NA 2

Using rmarkdown

You can use rmarkdown for paged tables.

library(rmarkdown)
paged_table(Eggs)<br. paged_table(StateCounty2012)

Editing a file

To edit data you can install editData (install.packages(“editData”)). You can read more about it here: https://cran.r-project.org/web/packages/editData/vignettes/editData.html

require(editData)
tibble(StateCounty2012)
result <- editData(StateCounty2012)

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Hackbarth (2021, Sept. 16). DACSS 601 Fall 2021: Homework 2. Retrieved from https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-16-homework-2-molly-hackbarth/

BibTeX citation

@misc{hackbarth2021homework,
  author = {Hackbarth, Molly},
  title = {DACSS 601 Fall 2021: Homework 2},
  url = {https://mrolfe.github.io/DACSS601Fall21/posts/2021-09-16-homework-2-molly-hackbarth/},
  year = {2021}
}