HW2

This assignment shows an example of reading in a dataset, explaining the variables in the dataset and then demonstrating at least 2 basic data-wrangling operations.

Laura Collazo
2022-02-09

The dataset used for this assignment is workshop_masterlist. It holds data on faculty who have participated in training offered by my office, and is being shared here with identifiers removed and permission.

The variables include:

variable data type
id char
campus char
workshop char
workshop_status_id* char
workshop_date date
semester char
workshop_year char

*The variable workshop_status_id was intentionally left as an id in the csv, for practice using mutate to tidy the data.

#read in dataset from csv and save as object

wm_csv <- read_csv("workshop_masterlist_2022-02-08.csv")

#create tibble

workshop_masterlist <- as_tibble(wm_csv)

#create new column workshop_status 

wm <- workshop_masterlist %>%
 mutate(workshop_status = case_when(
    workshop_status_id == 1 ~ "Pass",
    workshop_status_id == 2 ~ "No Pass",
    workshop_status_id == 3 ~ "Withdraw",
    workshop_status_id == 4 ~ "No Show",
    workshop_status_id == 5 ~ "Audit") )

#check variable types

str(wm)
tibble [5,844 x 8] (S3: tbl_df/tbl/data.frame)
 $ id                : num [1:5844] 3955 3956 3957 3958 3959 ...
 $ campus            : chr [1:5844] "Brooklyn" "Brooklyn" "York" "Brooklyn" ...
 $ workshop          : chr [1:5844] "OTE" "OTE" "OTE" "OTE" ...
 $ workshop_status_id: num [1:5844] 1 2 1 1 1 1 1 2 1 4 ...
 $ workshop_date     : Date[1:5844], format: "2020-07-09" ...
 $ semester          : chr [1:5844] "Summer 2020" "Summer 2020" "Summer 2020" "Summer 2020" ...
 $ workshop_year     : num [1:5844] 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
 $ workshop_status   : chr [1:5844] "Pass" "No Pass" "Pass" "Pass" ...
#select desired columns in new order, and change incorrect variable types

wm_tidy <- wm %>% select (id:workshop, workshop_status, workshop_year) %>%
                  mutate(id = as.character(id),
                         workshop_year = as.character(workshop_year))

#check variable types

str(wm_tidy)
tibble [5,844 x 5] (S3: tbl_df/tbl/data.frame)
 $ id             : chr [1:5844] "3955" "3956" "3957" "3958" ...
 $ campus         : chr [1:5844] "Brooklyn" "Brooklyn" "York" "Brooklyn" ...
 $ workshop       : chr [1:5844] "OTE" "OTE" "OTE" "OTE" ...
 $ workshop_status: chr [1:5844] "Pass" "No Pass" "Pass" "Pass" ...
 $ workshop_year  : chr [1:5844] "2020" "2020" "2020" "2020" ...
#filter for all participants who passed, and arrange by campus and then year
wm_tidy %>%
  filter(workshop_status == "Pass") %>%
  arrange(campus, workshop_year)
# A tibble: 4,740 x 5
   id    campus workshop workshop_status workshop_year
   <chr> <chr>  <chr>    <chr>           <chr>        
 1 239   Baruch PTO      Pass            2011         
 2 364   Baruch PTO      Pass            2011         
 3 430   Baruch PTO      Pass            2011         
 4 636   Baruch PTO      Pass            2011         
 5 684   Baruch PTO      Pass            2011         
 6 864   Baruch PTO      Pass            2011         
 7 959   Baruch PTO      Pass            2011         
 8 1048  Baruch PTO      Pass            2011         
 9 1061  Baruch PTO      Pass            2011         
10 1063  Baruch PTO      Pass            2011         
# ... with 4,730 more rows

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Collazo (2022, Feb. 9). Data Analytics and Computational Social Science: HW2. Retrieved from https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlcollazohw2/

BibTeX citation

@misc{collazo2022hw2,
  author = {Collazo, Laura},
  title = {Data Analytics and Computational Social Science: HW2},
  url = {https://github.com/DACSS/dacss_course_website/posts/httpsrpubscomlcollazohw2/},
  year = {2022}
}