Tumor reference resolution and characteristic extraction in radiology reports for liver cancer stage prediction

J Biomed Inform. 2016 Dec:64:179-191. doi: 10.1016/j.jbi.2016.10.005. Epub 2016 Oct 8.

Abstract

Background: Anaphoric references occur ubiquitously in clinical narrative text. However, the problem, still very much an open challenge, is typically less aggressively focused on in clinical text domain applications. Furthermore, existing research on reference resolution is often conducted disjointly from real-world motivating tasks.

Objective: In this paper, we present our machine-learning system that automatically performs reference resolution and a rule-based system to extract tumor characteristics, with component-based and end-to-end evaluations. Specifically, our goal was to build an algorithm that takes in tumor templates and outputs tumor characteristic, e.g. tumor number and largest tumor sizes, necessary for identifying patient liver cancer stage phenotypes.

Results: Our reference resolution system reached a modest performance of 0.66 F1 for the averaged MUC, B-cubed, and CEAF scores for coreference resolution and 0.43 F1 for particularization relations. However, even this modest performance was helpful to increase the automatic tumor characteristics annotation substantially over no reference resolution.

Conclusion: Experiments revealed the benefit of reference resolution even for relatively simple tumor characteristics variables such as largest tumor size. However we found that different overall variables had different tolerances to reference resolution upstream errors, highlighting the need to characterize systems by end-to-end evaluations.

Keywords: Cancer stages; Information extraction; Liver cancer; Natural language processing; Radiology report; Reference resolution.

MeSH terms

  • Algorithms
  • Data Mining*
  • Electronic Health Records
  • Humans
  • Liver Neoplasms / classification
  • Liver Neoplasms / diagnosis*
  • Liver Neoplasms / diagnostic imaging
  • Natural Language Processing*
  • Semantics