library(tidyverse)
library(googlesheets4)
::opts_chunk$set(echo = TRUE) knitr
UMass Admissions Data
UMass Admissions Data
Historical UMass admissions data, and data on the number of majors, are available in excel format from the University Analytics and Institutional Research Office. For this example, I transferred the excel workbooks into a google spreadsheet to read in using googlesheets4
.
First, we will set up the base URLs to use to access the data from google sheets. You will need to change this to access your own googlesheet or excel spreadsheet. (and modify subsequent “read” commands.)
<- gs4_get("https://docs.google.com/spreadsheets/d/1t-oEWdYCJndOSSW8dtn1dBC9U5-2KzjmL8cgbcPwDYs/edit?usp=sharing") admissions.url
ℹ Suitable tokens found in the cache, associated with these emails:
• 'mrolfe@umass.edu'
• 'mrrolfe@gmail.com'
Defaulting to the first email.
! Using an auto-discovered, cached token.
To suppress this message, modify your code or options to clearly consent to
the use of a cached token.
See gargle's "Non-interactive auth" vignette for more details:
<https://gargle.r-lib.org/articles/non-interactive-auth.html>
ℹ The googlesheets4 package is using a cached token for 'mrolfe@umass.edu'.
<-gs4_get("https://docs.google.com/spreadsheets/d/1sezSHYMsNqjJSzteVD1cogRFb0o0RdnuBilgZ5Ul4c0/edit?usp=sharing") majors.url
Create list of colleges and depts.
As can be viewed above, each workbook contains multiple sheets with the abbreviated name of the college or dept as the sheet name. Therefore, as a starting point for reading in the sheets, we need a list of them - much as we have done in the past using sheet_names
. The difference this time is that we also need to recover which college each department belongs to, which we can guess based on the order in which the departments appear (after the colleges, whose abbreviations all end in “X”.) We need to remove the overall Campus totals and only keep the individual Department pages, but also be sure to retain the additional College column indicated which college each department belongs to.
<-sheet_names(admissions.url)
dept_names
<- c(dept_names[str_ends(dept_names, "X")])
colleges colleges
[1] "HFAX" "ICSX" "CNSX" "SBSX" "EDUX" "ENGX" "SOMX" "NURX" "PUBX" "OTHX"
<-dept_names[dept_names!="Campus"]
dept_names
#find the position of colleges (depts follow)
<-which(str_ends(dept_names, "X"))
temp<-temp[-1]-temp[-length(temp)]
temp<- c(temp, 5)
temp
<-tibble(college = unlist(map2(colleges, temp, rep)),
deptsdept = dept_names)%>%
filter(!str_ends(dept, "X"))
depts
Read in dept. information
The general format of a department page can vary based on which programs are offered by the department. For example, we can see from the screenshot above that there are four categories of State-Supported Undergraduate admissions: First-Year, Stockbridge (A.S.), Transfer, Post-graduate; two categories of State-Supported Graduate admissions: Master’s and Doctoral; and three categories of University Without Walls (online or CPE) admissions: Undergraduate (Transfer and Post-graduate), Master’s, and Doctoral. Essentially, we have four critical pieces of information in the first column of each department sheet.
- mode: Campus or Online (UWW)
- degree: Undergraduate, Master’s or Doctoral
- ug_type: First-Year, Transfer, Stockbridge, or Post-graduate
- count (Applications, Acceptances, and Enrollments)
Note that both acceptance rate and yield could be easily calculated from the first three statistics provided, or could be kept. Our goal now will be to develop a function to read in data from a single sheet, and reshape it so that the counts are variables while the years (currently presented as column names) are in a column. This reshaping will make it easier to caluculate growth rates and other overtime information.
<-read_sheet(admissions.url$spreadsheet_id,
dept_infosheet = depts$dept[1], skip=6,
col_names = c("X1", 2011:2020))
<- grep("Definitions", dept_info$X1)-1
end <- dept_info[1:end,]%>%
dept_info filter(!is.na(X1))
dept_info
We now have a condensed data set read in with the correct column names, but need to break X1 into multiple columns with the correct information.
<-c("State-Supported Undergraduate",
mode_terms "State-Supported Graduate",
"University Without Walls")
<-c("State-Supported Undergraduate",
degree_terms"Master's", "Doctoral")
<-c("First-Year", "Stockbridge (A.S.)", "Transfer",
ug_cats"Post-graduate")
<-c("Applications", "Enrollments", "Acceptances")
counts
<-dept_info%>%
dept_infomutate(dept = depts$dept[1],
mode = ifelse(X1 %in% mode_terms, X1, NA),
degree = ifelse(X1 %in% degree_terms, X1, NA),
ug_cats = ifelse(X1 %in% ug_cats, X1, NA)
%>%
)fill(mode, degree, ug_cats)%>%
mutate(mode = ifelse(str_detect(mode, "State"),
"Campus", "Online"),
degree = ifelse(str_detect(degree, "State"),
"Undergraduate", degree),
ug_cats = ifelse(str_detect(degree, "Undergraduate"),
NA),
ug_cats, counts = ifelse(X1%in%counts, X1, NA))%>%
filter(!is.na(counts))%>%
select(-X1)
dept_info
Now, this is looking much better! Now our final step is to swap the rows and columns by using both pivot_longer
and pivot_wider
.
<-dept_info%>%
dept_infopivot_longer(cols=starts_with("20"),
names_to = "year",
values_to = "value")%>%
pivot_wider(id_cols = c(dept, mode, degree, ug_cats, year),
names_from = "counts",
values_from = "value")
dept_info
Ta da! We have all the information read in and the categorical variables are working and in the correct order. Now we can write a quick function using the steps above, and purrr:map
it into a dataframe!
<-function(sheet.url= admissions.url,
read.programs
dept.name){<-read_sheet(admissions.url$spreadsheet_id,
dept_infosheet = dept.name,
range="A7:K85",
col_names = c("X1", 2011:2020))
<- grep("Definitions", dept_info$X1)-1
end <- dept_info[1:end,]%>%
dept_info filter(!is.na(X1))
<-dept_info%>%
dept_infomutate(dept = dept.name,
mode = ifelse(X1 %in% mode_terms, X1, NA),
degree = ifelse(X1 %in% degree_terms, X1, NA),
ug_cats = ifelse(X1 %in% ug_cats, X1, NA)
%>%
)fill(mode, degree, ug_cats)%>%
mutate(mode = ifelse(str_detect(mode, "State"),
"Campus", "Online"),
degree = ifelse(str_detect(degree, "State"),
"Undergraduate", degree),
ug_cats = ifelse(str_detect(degree, "Undergraduate"),
NA),
ug_cats, counts = ifelse(X1%in%counts, X1, NA))%>%
filter(!is.na(counts))%>%
select(-X1)
%>%
dept_infopivot_longer(cols=starts_with("20"),
names_to = "year",
values_to = "value")%>%
pivot_wider(id_cols = c(dept, mode, degree, ug_cats, year),
names_from = "counts",
values_from = "value")
}
<-map_dfr(depts$dept,
admissions~read.programs(admissions.url, .x))
✔ Reading from "Admissions".
✔ Range ''AFRO'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ARCH'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ART'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CLSI'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ENGL'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ETHFA'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''HIST'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ARTH'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''JUDA'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''LANG'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''LING'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MUS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PHIL'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SVES'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''THEA'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''WOST'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CSCI'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''INFO'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ASTR'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BIOC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BIOL'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CHEM'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CNSO'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ENVC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ENVS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ETCNS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''FD S'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''GEO'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MATH'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MICB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MC-B'!A7:K85'.
✖ Request failed [429]. Retry 1 happens in 100.9 seconds ...
✔ Reading from "Admissions".
✔ Range ''NSB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''OEB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PHYS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PS E'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PSYC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SCI'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''STOC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''VETS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ANTH'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''COMM'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''DACS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ECON'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ETSBS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''JOUR'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''LARP'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''POLS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PUBA'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''RESE'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''STPE'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SOCI'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''EDUC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''EPRA'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ETEDU'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SDPS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''TECS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BME'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CH E'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CEE'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ECE'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ENGI'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MIE'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ACCT'!A7:K85'.
✖ Request failed [429]. Retry 1 happens in 100.9 seconds ...
✔ Reading from "Admissions".
✔ Range ''HTM'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''FIN'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MGT'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''MKTG'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''OIM'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SOM'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''SPOR'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BIEP'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''COMD'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''EHS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''ETPUB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''HPP'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''KIN'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''NUTR'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PUB'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''PUBS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BGS'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''BDIC'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''UWW'!A7:K85'.
✔ Reading from "Admissions".
✔ Range ''CA S'!A7:K85'.
<-admissions%>%
admissions_origleft_join(depts)%>%
relocate(college, .before = dept)
Joining, by = "dept"
admissions_orig
And now we have all of the admissions data in a single R object, with the ability to easily filter and group by the various categorical information from the original sheets - including the type of program, mode, degree, department, and college.
Yield rate and accept rate can be recalulated easily, as can lags or growth using the following information.
<-admissions_orig%>%
admissionsmutate(enroll_chg3yr = Enrollments - lag(Enrollments, n=3),
admit_rate = Acceptances/Applications,
yield_rate = Enrollments/Acceptances)
admissions