Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports

Kevin De Angeli; Shang Gao; Andrew Blanchard; Eric B Durbin; Xiao-Cheng Wu; Antoinette Stroup; Jennifer Doherty; Stephen M Schwartz; Charles Wiggins; Linda Coyle; Lynne Penberthy; Georgia Tourassi; Hong-Jun Yoon

doi:10.1093/jamiaopen/ooac075

Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports

JAMIA Open. 2022 Sep 13;5(3):ooac075. doi: 10.1093/jamiaopen/ooac075. eCollection 2022 Oct.

Authors

Affiliations

¹ Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
² University of Tennessee, Knoxville, Tennessee, USA.
³ College of Medicine, University of Kentucky, Lexington, Kentucky, USA.
⁴ Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, Louisiana, USA.
⁵ Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey, USA.
⁶ Utah Cancer Registry, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah, USA.
⁷ Fred Hutchinson Cancer Center, Epidemiology Program, Seattle, Washington, USA.
⁸ University of New Mexico, Albuquerque, New Mexico, USA.
⁹ Information Management Services Inc., Calverton, Maryland, USA.
¹⁰ National Cancer Institute, Bethesda, Maryland, USA.

Abstract

Objective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports.

Materials and methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs. We performed knowledge transfer by training a single model (student) with soft labels derived through the aggregation of ensemble predictions. We evaluate performance based on accuracy and abstention rates by using softmax thresholding.

Results: The student model outperforms the baseline MtCNN in terms of abstention rates and accuracy, thereby allowing the model to be used with a larger volume of documents when deployed. The highest boost was observed for subsite and histology, for which the student model classified an additional 1.81% reports for subsite and 3.33% reports for histology.

Discussion: Ensemble predictions provide a useful strategy for quantifying the uncertainty inherent in labeled data and thereby enable the construction of soft labels with estimated probabilities for multiple classes for a given document. Training models with the derived soft labels reduce model confidence in difficult-to-classify documents, thereby leading to a reduction in the number of highly confident wrong predictions.

Conclusions: Ensemble model distillation is a simple tool to reduce model overconfidence in problems with extreme class imbalance and noisy datasets. These methods can facilitate the deployment of deep learning models in high-risk domains with low computational resources where minimizing inference time is required.

Keywords: CNN; NLP; deep learning; ensemble distillation; selective classification.

Grants and funding

P30 CA177558/CA/NCI NIH HHS/United States