Exploring Integrative Analysis Using the BioMedical Evidence Graph

Adam Struck; Brian Walsh; Alexander Buchanan; Jordan A Lee; Ryan Spangler; Joshua M Stuart; Kyle Ellrott

doi:10.1200/CCI.19.00110

Exploring Integrative Analysis Using the BioMedical Evidence Graph

JCO Clin Cancer Inform. 2020 Feb:4:147-159. doi: 10.1200/CCI.19.00110.

Authors

Adam Struck¹, Brian Walsh¹, Alexander Buchanan¹, Jordan A Lee¹, Ryan Spangler¹, Joshua M Stuart^{2

3}, Kyle Ellrott¹

Affiliations

¹ Biomedical Engineering, Oregon Health and Science University, Portland OR.
² Biomolecular Engineering Department, University of California, Santa Cruz, Santa Cruz, CA.
³ University of California Santa Cruz Genomics Institute, University of California, Santa Cruz Santa Cruz, CA.

Abstract

Purpose: The analysis of cancer biology data involves extremely heterogeneous data sets, including information from RNA sequencing, genome-wide copy number, DNA methylation data reporting on epigenetic regulation, somatic mutations from whole-exome or whole-genome analyses, pathology estimates from imaging sections or subtyping, drug response or other treatment outcomes, and various other clinical and phenotypic measurements. Bringing these different resources into a common framework, with a data model that allows for complex relationships as well as dense vectors of features, will unlock integrated data set analysis.

Methods: We introduce the BioMedical Evidence Graph (BMEG), a graph database and query engine for discovery and analysis of cancer biology. The BMEG is unique from other biologic data graphs in that sample-level molecular and clinical information is connected to reference knowledge bases. It combines gene expression and mutation data with drug-response experiments, pathway information databases, and literature-derived associations.

Results: The construction of the BMEG has resulted in a graph containing > 41 million vertices and 57 million edges. The BMEG system provides a graph query-based application programming interface to enable analysis, with client code available for Python, Javascript, and R, and a server online at bmeg.io. Using this system, we have demonstrated several forms of cross-data set analysis to show the utility of the system.

Conclusion: The BMEG is an evolving resource dedicated to enabling integrative analysis. We have demonstrated queries on the system that illustrate mutation significance analysis, drug-response machine learning, patient-level knowledge-base queries, and pathway level analysis. We have compared the resulting graph to other available integrated graph systems and demonstrated the former is unique in the scale of the graph and the type of data it makes available.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Antineoplastic Agents / therapeutic use*
Biomarkers, Tumor / genetics*
Computational Biology / methods*
Computer Graphics
Databases, Factual
Gene Expression Regulation, Neoplastic / drug effects*
Gene Regulatory Networks
Humans
Medical Informatics*
Neoplasms / diagnosis*
Neoplasms / drug therapy*
Neoplasms / genetics
Signal Transduction

Substances

Antineoplastic Agents
Biomarkers, Tumor

Abstract

Publication types

MeSH terms

Substances

Grants and funding