A Multiomics Graph Database System for Biological Data Integration and Cancer Informatics

J Comput Biol. 2021 Feb;28(2):209-219. doi: 10.1089/cmb.2020.0231. Epub 2020 Aug 12.

Abstract

The multiomics data are heterogeneous and come from different biological levels such as epigenetics, genomics, transcriptomics and proteomics. The development of high-throughput technologies has enabled researchers not only to study all the entities together but also to utilize information from different levels spanning DNA methylation, copy number variation (CNV), mutation, gene expression, and miRNA expression. With the recent advancement in image informatics, the field of radiomics is rapidly emerging. It can be expected that the information from microscopic images of the tissue will soon be part of many multiomics studies. Meanwhile, integration of different kinds of multiomics data to extract relevant biological information is currently a big challenge. This study is our ongoing effort to develop a model that properly integrates multiomics data and allows easy retrieval of information relevant to biological processes. In this article, we have enriched our previous graph database model to store gene expression, miRNA expression, DNA methylation, mutation, CNV, clinical data, including information of the image of tissue slides. To show that the model is working, we used data from the Cancer Genome Atlas for three cancer types.

Keywords: cancer informatics; data integration; graph database; multiomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Computational Biology / methods*
  • DNA Copy Number Variations
  • DNA Methylation*
  • Databases, Factual
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic
  • Gene Regulatory Networks*
  • Genetic Variation*
  • Humans
  • MicroRNAs / genetics
  • Middle Aged
  • Mutation
  • Neoplasms / genetics*
  • Neoplasms / pathology

Substances

  • MicroRNAs