PhysioSpace: relating gene expression experiments from heterogeneous sources using shared physiological processes

PLoS One. 2013 Oct 17;8(10):e77627. doi: 10.1371/journal.pone.0077627. eCollection 2013.

Abstract

Relating expression signatures from different sources such as cell lines, in vitro cultures from primary cells and biopsy material is an important task in drug development and translational medicine as well as for tracking of cell fate and disease progression. Especially the comparison of large scale gene expression changes to tissue or cell type specific signatures is of high interest for the tracking of cell fate in (trans-) differentiation experiments and for cancer research, which increasingly focuses on shared processes and the involvement of the microenvironment. These signature relation approaches require robust statistical methods to account for the high biological heterogeneity in clinical data and must cope with small sample sizes in lab experiments and common patterns of co-expression in ubiquitous cellular processes. We describe a novel method, called PhysioSpace, to position dynamics of time series data derived from cellular differentiation and disease progression in a genome-wide expression space. The PhysioSpace is defined by a compendium of publicly available gene expression signatures representing a large set of biological phenotypes. The mapping of gene expression changes onto the PhysioSpace leads to a robust ranking of physiologically relevant signatures, as rigorously evaluated via sample-label permutations. A spherical transformation of the data improves the performance, leading to stable results even in case of small sample sizes. Using PhysioSpace with clinical cancer datasets reveals that such data exhibits large heterogeneity in the number of significant signature associations. This behavior was closely associated with the classification endpoint and cancer type under consideration, indicating shared biological functionalities in disease associated processes. Even though the time series data of cell line differentiation exhibited responses in larger clusters covering several biologically related patterns, top scoring patterns were highly consistent with a priory known biological information and separated from the rest of response patterns.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Line
  • Gene Expression / genetics*
  • Gene Expression Profiling / methods*
  • Genome-Wide Association Study
  • Humans

Grants and funding

M.L. is supported by the German federal state of North Rhine Westphalia (NRW) and the European Union (European Regional Development Fund: Investing in Your Future) via the StemCellFactory project (stemcellfactory.de). B.M.S. was partially supported by the MedSys project of the Federal Ministry of Education and Research (BMBF, www.bmbf.de, Grant number 0315416A). F-J.M. is supported by an Else Kröner-Fresenius-Stiftung (www.ekfs.de) fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.