Multi-omic data integration enables discovery of hidden biological regularities

Nat Commun. 2016 Oct 26:7:13091. doi: 10.1038/ncomms13091.

Abstract

Rapid growth in size and complexity of biological data sets has led to the 'Big Data to Knowledge' challenge. We develop advanced data integration methods for multi-level analysis of genomic, transcriptomic, ribosomal profiling, proteomic and fluxomic data. First, we show that pairwise integration of primary omics data reveals regularities that tie cellular processes together in Escherichia coli: the number of protein molecules made per mRNA transcript and the number of ribosomes required per translated protein molecule. Second, we show that genome-scale models, based on genomic and bibliomic data, enable quantitative synchronization of disparate data types. Integrating omics data with models enabled the discovery of two novel regularities: condition invariant in vivo turnover rates of enzymes and the correlation of protein structural motifs and translational pausing. These regularities can be formally represented in a computable format allowing for coherent interpretation and prediction of fitness and selection that underlies cellular physiology.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • Datasets as Topic*
  • Enzymes / metabolism
  • Escherichia coli / physiology*
  • Gene Expression Profiling / methods*
  • Models, Biological*
  • Proteomics / methods*
  • RNA, Bacterial / genetics
  • RNA, Bacterial / metabolism
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Ribosomes / genetics
  • Ribosomes / metabolism

Substances

  • Bacterial Proteins
  • Enzymes
  • RNA, Bacterial
  • RNA, Messenger