Background: Colorectal cancer is the most commonly occurring cancer in Germany, and the second and third most commonly diagnosed cancer in women and men, respectively. The therapy for this disease is based primarily on the tumor stages, which are usually documented in an unstructured form in medical information systems. In order to re-use this knowledge, the information must be extracted and annotated using the correct terminology.
Methods: In this study, a natural language processing pipeline is developed to identify specific guideline-based patient information and to annotate it with Unified Medical Language System concepts for manual evaluation by a physician. The gold standard for one-time evaluation is determined using the human abstraction of 2513 German clinical notes from electronic health records.
Results: Using this approach to process the narrative clinical notes on colorectal cancer for retrospective evaluation of the therapy recommendation, the algorithm achieves a precision value of 96.64% for tumor stage detection and 97.95% for diagnosis recognition with recall values of 94.89% and 99.54%, respectively. The average precision value across all concepts relevant to treatment decisions for patients with known cancer diagnoses (11 concept groups) achieved a precision value of 82.05% with a recall value of 82.45% and an F1-score of 81.81%, respectively.
Conclusions: The identification of guideline-based information from narrative clinical notes has the potential for implementation as clinical decision support tools.
Keywords: Electronic health records; Information extraction; Knowledge management; Natural language processing.
Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.