Large Language Models to Identify Advance Care Planning in Patients With Advanced Cancer

J Pain Symptom Manage. 2024 Nov 24:S0885-3924(24)01128-X. doi: 10.1016/j.jpainsymman.2024.11.016. Online ahead of print.

Abstract

Context: Efficiently tracking Advance Care Planning (ACP) documentation in electronic heath records (EHRs) is essential for quality improvement and research efforts. The use of large language models (LLMs) offers a novel approach to this task.

Objectives: To evaluate the ability of LLMs to identify ACP in EHRs for patients with advanced cancer and compare performance to gold-standard manual chart review and natural language processing (NLP).

Methods: EHRs from patients with advanced cancer followed at seven Dana Farber Cancer Center (DFCI) clinics in June 2024. We utilized GPT-4o-2024-05-13 within DFCI's HIPAA-secure digital infrastructure. We designed LLM prompts to identify ACP domains: goals of care, limitation of life-sustaining treatment, hospice, and palliative care. We developed a novel hallucination index to measure production of factually-incorrect evidence by the LLM. Performance was compared to gold-standard manual chart review and NLP.

Results: 60 unique patients associated with 528 notes were used to construct the gold-standard data set. LLM prompts had sensitivity ranging from 0.85 to 1.0, specificity ranging from 0.80 to 0.91, and accuracy ranging from 0.81 to 0.91 across domains. The LLM had better sensitivity than NLP for identifying complex topics such as goals of care. Average hallucination index for notes identified by LLM was less than 0.5, indicating a low probability of hallucination. Despite lower precision compared to NLP, false positive documentation identified by LLMs was clinically-relevant and useful for guiding management.

Conclusion: LLMs can capture ACP domains from EHRs, with sensitivity exceeding NLP methods for complex domains such as goals of care. Future studies should explore approaches for scaling this methodology.

Keywords: Advance care planning; Artificial intelligence; Goals of care; Large language models.