Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

Jake Lever; Martin R Jones; Arpad M Danos; Kilannin Krysiak; Melika Bonakdar; Jasleen K Grewal; Luka Culibrk; Obi L Griffith; Malachi Griffith; Steven J M Jones

doi:10.1186/s13073-019-0686-y

Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

Genome Med. 2019 Dec 3;11(1):78. doi: 10.1186/s13073-019-0686-y.

Authors

Jake Lever^{1

2}, Martin R Jones¹, Arpad M Danos³, Kilannin Krysiak^{3

4}, Melika Bonakdar¹, Jasleen K Grewal^{1

2}, Luka Culibrk^{1

2}, Obi L Griffith^{5

6

7

8}, Malachi Griffith^{9

10

11

12}, Steven J M Jones^{13

14

15}

Affiliations

¹ Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada.
² University of British Columbia, Vancouver, BC, Canada.
³ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
⁴ Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
⁵ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
⁶ Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
⁷ Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
⁸ Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
⁹ McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
¹⁰ Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
¹¹ Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
¹² Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA. [email protected].
¹³ Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada. [email protected].
¹⁴ University of British Columbia, Vancouver, BC, Canada. [email protected].
¹⁵ Simon Fraser University, Burnaby, BC, Canada. [email protected].

Abstract

Background: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature.

Methods: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase.

Results: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications.

Conclusions: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.

Keywords: Cancer biomarkers; Information extraction; Machine learning; Precision oncology; Text mining.

Publication types

Meta-Analysis
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Biomarkers, Tumor*
Data Mining*
Databases, Factual*
Disease Management
Humans
Machine Learning
Medical Informatics / methods
Neoplasms / etiology*
Neoplasms / therapy*
Precision Medicine / methods
User-Computer Interface

Substances

Biomarkers, Tumor

Abstract

Publication types

MeSH terms

Substances

Grants and funding