Natural Language Processing-Assisted Literature Retrieval and Analysis for Combination Therapy in Cancer

JCO Clin Cancer Inform. 2022 Jan:6:e2100109. doi: 10.1200/CCI.21.00109.

Abstract

Purpose: Despite advances in molecular therapeutics, few anticancer agents achieve durable responses. Rational combinations using two or more anticancer drugs have the potential to achieve a synergistic effect and overcome drug resistance, enhancing antitumor efficacy. A publicly accessible biomedical literature search engine dedicated to this domain will facilitate knowledge discovery and reduce manual search and review.

Methods: We developed RetriLite, an information retrieval and extraction framework that leverages natural language processing and domain-specific knowledgebase to computationally identify highly relevant papers and extract key information. The modular architecture enables RetriLite to benefit from synergizing information retrieval and natural language processing techniques while remaining flexible to customization. We customized the application and created an informatics pipeline that strategically identifies papers that describe efficacy of using combination therapies in clinical or preclinical studies.

Results: In a small pilot study, RetriLite achieved an F1 score of 0.93. A more extensive validation experiment was conducted to determine agents that have enhanced antitumor efficacy in vitro or in vivo with poly (ADP-ribose) polymerase inhibitors: 95.9% of the papers determined to be relevant by our application were true positive and the application's feature of distinguishing a clinical paper from a preclinical paper achieved an accuracy of 97.6%. Interobserver assessment was conducted, which resulted in a 100% concordance. The data derived from the informatics pipeline have also been made accessible to the public via a dedicated online search engine with an intuitive user interface.

Conclusion: RetriLite is a framework that can be applied to establish domain-specific information retrieval and extraction systems. The extensive and high-quality metadata tags along with keyword highlighting facilitate information seekers to more effectively and efficiently discover knowledge in the combination therapy domain.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Information Storage and Retrieval
  • Natural Language Processing*
  • Neoplasms* / drug therapy
  • Pilot Projects
  • Search Engine