Derivation and external validation of a portable method to identify patients with pulmonary embolism from radiology reports: The READ-PE algorithm

Thromb Res. 2024 Sep:241:109105. doi: 10.1016/j.thromres.2024.109105. Epub 2024 Jul 26.

Abstract

Background: Identification of pulmonary embolism (PE) across a cohort currently requires burdensome manual review. Previous approaches to automate capture of PE diagnosis have either been too complex for widespread use or have lacked external validation. We sought to develop and validate the Regular Expression Aided Determination of PE (READ-PE) algorithm, which uses a portable text-matching approach to identify PE in reports from computed tomography with angiography (CTA).

Methods: We identified derivation and validation cohorts of final radiology reports for CTAs obtained on adults (≥ 18 years) at two independent, quaternary academic emergency departments (EDs) in the United States. All reports were in the English language. We manually reviewed CTA reports for PE as a reference standard. In the derivation cohort, we developed the READ-PE algorithm by iteratively combining regular expressions to identify PE. We validated the READ-PE algorithm in an independent cohort, and compared performance against three prior algorithms with sensitivity, specificity, positive-predictive-value (PPV), negative-predictive-value (NPV), and the F1 score.

Results: Among 2948 CTAs in the derivation cohort 10.8 % had PE and the READ-PE algorithm reached 93 % sensitivity, 99 % specificity, 94 % PPV, 99 % NPV, and 0.93 F1 score, compared to F1 scores ranging from 0.50 to 0.85 for three prior algorithms. Among 1206 CTAs in the validation cohort 9.2 % had PE and the algorithm had 98 % sensitivity, 98 % specificity, 85 % PPV, 100 % NPV, and 0.91 F1 score.

Conclusions: The externally validated READ-PE algorithm identifies PE in English-language reports from CTAs obtained in the ED with high accuracy. This algorithm may be used in the electronic health record to accurately identify PE for research or surveillance. If implemented at other EDs, it should first undergo local validation and may require maintenance over time.

Keywords: Cohort studies; Information storage and retrieval; Natural language processing; Pulmonary embolism; Validity; Venous thromboembolism.

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Cohort Studies
  • Computed Tomography Angiography / methods
  • Female
  • Humans
  • Male
  • Middle Aged
  • Pulmonary Embolism* / diagnosis
  • Pulmonary Embolism* / diagnostic imaging
  • Tomography, X-Ray Computed / methods