Final Project Assignment#1: Michele Carlin

final_Project_assignment_1
final_project_data_description
UMass Chan Medical School - Clerkship Grades
Author

Michele Carlin

Published

April 12, 2023

Introduction

The data sets I will be exploring for my final project are from UMass Chan Medical School. Third year medical students complete seven different clerkships/rotations throughout the academic year. Within each clerkship, they are assessed in various ways. Student performance evaluations (SPEs) are completed by the physicians they work with in the clinical setting. Each student is evaluated by multiple physicians, an average is then calculated across all SPEs. The students also complete an Objective Structured Clinical Examination (OSCE) at the end of each clerkship where students are assessed by Standardized Patients (SPs) in various simulated patient encounters. Each SP is trained to portray a patient in a clinical setting with a specific chief complaint. The SPs then complete checklists on a variety of history and physical exam skills; scores are then calculated for each encounter and summarized as an overall OSCE score. In addition, students complete a multiple-choice National Board of Medical Examiners (NBME) exam at the end of each clerkship. Scores from these various grading components are used to calculate final grades.

Describe the data set(s).

The data is currently in two different excel files, one containing final grades from AY1718-AY2122, and another containing component grades from AY1920-AY2122. I will merge these data sets into a new ‘Grades’ data set, where each row will contain a student scores for one clerkship. Each student will then have seven rows of data (one for each clerkship).

Load the following libraries.

library(tidyverse)
library(summarytools)
library(dbplyr)
library(readxl)
library(tidyr)
library(ggplot2)


knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)

Dataset #1: Read in and view summary of ‘FinalGrades’ dataset. This excel file has 5 worksheets, one for each year (AY1718 - AY2122), that will be combined into one dataset.

setwd("C:/Users/CarlinML/DACSS-601/601_Spring_2023/Posts/")
sheet <- excel_sheets("MicheleCarlin_FinalProjectData/AY1718-AY2122_FinalGrades.xlsx")
sheet
[1] "AY1718" "AY1819" "AY1920" "AY2021" "AY2122"
FinalGrades <- lapply(setNames(sheet, sheet), function(x) read_excel("MicheleCarlin_FinalProjectData/AY1718-AY2122_FinalGrades.xlsx", sheet=x))
FinalGrades <- bind_rows(FinalGrades, .id="Sheet")
View(FinalGrades)
view(dfSummary(FinalGrades))

The ‘FinalGrades’ dataset contains 5081 rows and 8 columns (including the course subject area, site, and session).

Dataset #2: Read in and view summary of ‘ComponentGrades’ dataset. This excel file contains one worksheet.

ComponentGrades <- read_excel ("MicheleCarlin_FinalProjectData/AY1920-AY2122_ComponentGrades.xlsx")
View(ComponentGrades)
view(dfSummary(ComponentGrades))

The ‘ComponentGrades’ dataset contains 3259 rows and 9 columns (including course subject, term, as well as SPE, OSCE, and NBME scores).

Join the ‘FinalGrades’ and ‘ComponentGrades’ datasets into one, matching on ID and Subject.

Grades <- FinalGrades %>% full_join(ComponentGrades, 
                              by=c('ID','Subject'))
View(Grades)
view(dfSummary(Grades))

The new ‘Grades’ dataset contains 5149 rows and 15 columns. I performed a full join, therefore b/c there are more rows in this merged data set than I had in the FinalGrades data set, it is evident that there are some students that do not have a final grade but have component scores.

The first thing I plan to review is whether final grades and component scores are stable from year to year. In addition, multiple sites are needed to accommodate the number of students enrolled in each clerkship, therefore I am interested in examining the data at the sites level to determine if students at particular sites perform better than those at other sites.

Below are some basic summary statistics of the grading component variables.

Grades %>%
 filter(!is.na(Year)) %>%
  group_by(Year, Subject) %>%
  summarise(Avg_SPE = mean(SPE),
            Min_SPE = min(SPE),
            Max_SPE = max(SPE),
            Avg_OSCE = mean(OSCE),
            Min_OSCE = min(OSCE),
            Max_OSCE = max(OSCE),
            Avg_NBME = mean(NBME),
            Min_NBME = min(NBME),
            Max_NBME = max(NBME)) %>%
  print(n=21)
# A tibble: 21 × 11
# Groups:   Year [3]
   Year  Subject Avg_SPE Min_SPE Max_SPE Avg_O…¹ Min_O…² Max_O…³ Avg_N…⁴ Min_N…⁵
   <chr> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 AY19… FC        37.6    28.6    40      12.7     10.5      15    87.7    75.3
 2 AY19… ME         3.50    2.82    3.95    3.20     2         4    77.5    60  
 3 AY19… NU         3.43    2.08    4      15.4     10        20    80.8    65  
 4 AY19… OB        NA      NA      NA      21.1      0        24    78.3    60  
 5 AY19… PE         3.24    2.28    3.93   85.7     73        96    79.3    64  
 6 AY19… PS        84.8    62     100      84.4     70        94    83.8    69  
 7 AY19… SU         3.54    2.33    4      NA       NA        NA    76.5    58  
 8 AY20… FC        37.9    32.6    40      17.7     14.1      20    13.2    10.7
 9 AY20… ME         3.56    2.97    3.92    3.11     0         4    NA      NA  
10 AY20… NU         3.47    2.33    4      15.1      0        20    80.9    50  
11 AY20… OB        NA      NA      NA      21.4     19        23    77.3    58  
12 AY20… PE         3.25    2.02    3.97   87.4     73        95    79.7    63  
13 AY20… PS        89.5    70     100      84.4     72        94    NA      NA  
14 AY20… SU         3.62    2.76    4      75.5     65        86    75.3    55  
15 AY21… FC        43.1    37.9    45      17.9      0        20    17.4    15.0
16 AY21… ME         3.53    3.08    3.92   NA       NA        NA    78.1    58  
17 AY21… NU         3.47    2.75    4      15.3      0        20    81.6    62  
18 AY21… OB        39.9     4      45      21.2     19        23    79.0    58  
19 AY21… PE         3.45    2.85    4      89.5     76        96    80.8    64  
20 AY21… PS        92.6    74     100      83.1     69        93    85.2    72  
21 AY21… SU         3.63    2.76    4      76.2     60        89    75.4    50  
# … with 1 more variable: Max_NBME <dbl>, and abbreviated variable names
#   ¹​Avg_OSCE, ²​Min_OSCE, ³​Max_OSCE, ⁴​Avg_NBME, ⁵​Min_NBME

Tentative Plan for Visualization.

Based on the above descriptive statistics, I will need to determine why some components have NA for a given year; and will need to determine how to handle missing data if that is what’s causing the NAs. I also see that some of the scores are reported on a 0-100% scale, while others are based on a 4-point scale, and still others are based on the weighted # of points (e.g., if the OSCE is worth 20% of the final grade, the scores are entered on a scale of 0-20). I will convert all scores to be on a 0-100% scale.