Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing

Sharon Jiang; Barbara D Lam; Monica Agrawal; Shannon Shen; Nicholas Kurtzman; Steven Horng; David R Karger; David Sontag

doi:10.1093/jamia/ocae092

Machine learning to predict notes for chart review in the oncology setting: a proof of concept strategy for improving clinician note-writing

J Am Med Inform Assoc. 2024 Jun 20;31(7):1578-1582. doi: 10.1093/jamia/ocae092.

Authors

Sharon Jiang^{1

2}, Barbara D Lam^{3

4}, Monica Agrawal^{1

2}, Shannon Shen^{1

2}, Nicholas Kurtzman⁵, Steven Horng^{4

5}, David R Karger^{1

2}, David Sontag^{1

2}

Affiliations

¹ Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
² Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.
³ Division of Hematology and Oncology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.
⁴ Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.
⁵ Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States.

PMID: 38700253
PMCID: PMC11187428 (available on 2025-05-03)
DOI: 10.1093/jamia/ocae092

Abstract

Objective: Leverage electronic health record (EHR) audit logs to develop a machine learning (ML) model that predicts which notes a clinician wants to review when seeing oncology patients.

Materials and methods: We trained logistic regression models using note metadata and a Term Frequency Inverse Document Frequency (TF-IDF) text representation. We evaluated performance with precision, recall, F1, AUC, and a clinical qualitative assessment.

Results: The metadata only model achieved an AUC 0.930 and the metadata and TF-IDF model an AUC 0.937. Qualitative assessment revealed a need for better text representation and to further customize predictions for the user.

Discussion: Our model effectively surfaces the top 10 notes a clinician wants to review when seeing an oncology patient. Further studies can characterize different types of clinician users and better tailor the task for different care settings.

Conclusion: EHR audit logs can provide important relevance data for training ML models that assist with note-writing in the oncology setting.

Keywords: electronic health record; machine learning; natural language processing; note writing.

MeSH terms

Electronic Health Records*
Humans
Logistic Models
Machine Learning*
Medical Audit
Medical Oncology*
Metadata
Proof of Concept Study

Abstract

MeSH terms

Grants and funding