Importance: Serial functional status assessments are critical to heart failure (HF) management but are often described narratively in documentation, limiting their use in quality improvement or patient selection for clinical trials.
Objective: To develop and validate a deep learning natural language processing (NLP) strategy for extracting functional status assessments from unstructured clinical documentation.
Design, setting, and participants: This diagnostic study used electronic health record data collected from January 1, 2013, through June 30, 2022, from patients diagnosed with HF seeking outpatient care within 3 large practice networks in Connecticut (Yale New Haven Hospital [YNHH], Northeast Medical Group [NMG], and Greenwich Hospital [GH]). Expert-annotated notes were used for NLP model development and validation. Data were analyzed from February to April 2024.
Exposures: Development and validation of NLP models to detect explicit New York Heart Association (NYHA) classification, HF symptoms during activity or rest, and frequency of functional status assessments.
Main outcomes and measures: Outcomes of interest were model performance metrics, including area under the receiver operating characteristic curve (AUROC), and frequency of NYHA class documentation and HF symptom descriptions in unannotated notes.
Results: This study included 34 070 patients with HF (mean [SD] age 76.1 [12.6] years; 17 728 [52.0]% female). Among 3000 expert-annotated notes (2000 from YNHH and 500 each from NMG and GH), 374 notes (12.4%) mentioned NYHA class and 1190 notes (39.7%) described HF symptoms. The NYHA class detection model achieved a class-weighted AUROC of 0.99 (95% CI, 0.98-1.00) at YNHH, the development site. At the 2 validation sites, NMG and GH, the model achieved class-weighted AUROCs of 0.98 (95% CI, 0.96-1.00) and 0.98 (95% CI, 0.92-1.00), respectively. The model for detecting activity- or rest-related symptoms achieved an AUROC of 0.94 (95% CI, 0.89-0.98) at YNHH, 0.94 (95% CI, 0.91-0.97) at NMG, and 0.95 (95% CI, 0.92-0.99) at GH. Deploying the NYHA model among 182 308 unannotated notes from the 3 sites identified 23 830 (13.1%) notes with NYHA mentions, specifically 10 913 notes (6.0%) with class I, 12 034 notes (6.6%) with classes II or III, and 883 notes (0.5%) with class IV. An additional 19 730 encounters (10.8%) could be classified into functional status groups based on activity- or rest-related symptoms, resulting in a total of 43 560 medical notes (23.9%) categorized by NYHA, an 83% increase compared with explicit mentions alone.
Conclusions and relevance: In this diagnostic study of 34 070 patients with HF, the NLP approach accurately extracted a patient's NYHA symptom class and activity- or rest-related HF symptoms from clinical notes, enhancing the ability to track optimal care delivery and identify patients eligible for clinical trial participation from unstructured documentation.