Homework 2, Data Import

Second Assignment

Justin Meade
2022-02-08

I am using StateCounty2012.xls from the example datasets

1. Read dataset

Load Tidyverse and readxl to import .xls, Set working directory to location with R script, import dataset

library(tidyverse)
library(readxl)
setwd('G:/My Drive/School/UMASS/DACSS/DACSS_601/assignment2')
exampleCounties <- filter(select(read_xls('./StateCounty2012.xls',skip=3),STATE,COUNTY,TOTAL), !is.na(COUNTY))

Display First 10 Records of exampleCounties

head(exampleCounties,n=10L)
# A tibble: 10 x 3
   STATE COUNTY               TOTAL
   <chr> <chr>                <dbl>
 1 AE    APO                      2
 2 AK    ANCHORAGE                7
 3 AK    FAIRBANKS NORTH STAR     2
 4 AK    JUNEAU                   3
 5 AK    MATANUSKA-SUSITNA        2
 6 AK    SITKA                    1
 7 AK    SKAGWAY MUNICIPALITY    88
 8 AL    AUTAUGA                102
 9 AL    BALDWIN                143
10 AL    BARBOUR                  1

2. Explain the variables in the exampleCounties:

str(exampleCounties)
tibble [2,930 x 3] (S3: tbl_df/tbl/data.frame)
 $ STATE : chr [1:2930] "AE" "AK" "AK" "AK" ...
 $ COUNTY: chr [1:2930] "APO" "ANCHORAGE" "FAIRBANKS NORTH STAR" "JUNEAU" ...
 $ TOTAL : num [1:2930] 2 7 2 3 2 1 88 102 143 1 ...

‘STATE’ column contains state names, character column type. ‘CTY’ column contains county names, character column type. ‘TOTAL’ column is numeric; it denotes a total.

3. Data Wrangling Operations!!

1st Data-Wrangling operation

Create datafreme, exampleCounties2. Contains only records where STATE starts with ‘MA’ arranged by descending TOTAL

exampleCounties2 <- arrange(filter(exampleCounties, startsWith(STATE, 'MA')), 0-TOTAL)
head(exampleCounties2, n=20L)
# A tibble: 12 x 3
   STATE COUNTY     TOTAL
   <chr> <chr>      <dbl>
 1 MA    MIDDLESEX    673
 2 MA    SUFFOLK      558
 3 MA    PLYMOUTH     429
 4 MA    NORFOLK      386
 5 MA    ESSEX        314
 6 MA    WORCESTER    310
 7 MA    BRISTOL      232
 8 MA    HAMPDEN      202
 9 MA    FRANKLIN     113
10 MA    HAMPSHIRE     68
11 MA    BERKSHIRE     50
12 MA    BARNSTABLE    44

STATE == ‘TX’ AND TOTAL >= 300, arranged alphabetically.

head(arrange(exampleCounties %>%
  filter(STATE =='TX') %>% filter(TOTAL >= 300),COUNTY))
# A tibble: 6 x 3
  STATE COUNTY  TOTAL
  <chr> <chr>   <dbl>
1 TX    BELL      413
2 TX    BEXAR     950
3 TX    DALLAS    406
4 TX    DENTON    394
5 TX    EL PASO   863
6 TX    HARRIS   2535

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Meade (2022, Feb. 13). Data Analytics and Computational Social Science: Homework 2, Data Import. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommeade68863819/

BibTeX citation

@misc{meade2022homework,
  author = {Meade, Justin},
  title = {Data Analytics and Computational Social Science: Homework 2, Data Import},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscommeade68863819/},
  year = {2022}
}