Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data

Leandro Hermida; Carine Poussin; Michael B Stadler; Sylvain Gubian; Alain Sewer; Dimos Gaidatzis; Hans-Rudolf Hotz; Florian Martin; Vincenzo Belcastro; Stéphane Cano; Manuel C Peitsch; Julia Hoeng

doi:10.1186/1471-2164-14-514

Confero: an integrated contrast data and gene set platform for computational analysis and biological interpretation of omics data

BMC Genomics. 2013 Jul 29:14:514. doi: 10.1186/1471-2164-14-514.

Authors

Leandro Hermida¹, Carine Poussin, Michael B Stadler, Sylvain Gubian, Alain Sewer, Dimos Gaidatzis, Hans-Rudolf Hotz, Florian Martin, Vincenzo Belcastro, Stéphane Cano, Manuel C Peitsch, Julia Hoeng

Affiliation

¹ Philip Morris International Research & Development, Quai Jeanrenaud 5, CH-2000 Neuchatel, Switzerland. [email protected]

Abstract

Background: High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.).

Results: To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated as an analysis module as well as additional tools to support biological interpretation. Confero is a standalone system that also integrates with Galaxy, an open-source workflow management and data integration system. To illustrate Confero platform functionality we walk through major aspects of the Confero workflow and results using the Bioconductor estrogen package dataset.

Conclusion: Confero provides a unique and flexible platform to support downstream computational analysis facilitating biological interpretation. The system has been designed in order to provide the researcher with a simple, innovative, and extensible solution to store and exploit analyzed data in a sustainable and reproducible manner thereby accelerating knowledge-driven research. Confero source code is freely available from http://sourceforge.net/projects/confero/.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Computational Biology / methods*
Data Interpretation, Statistical
Database Management Systems
Databases, Genetic*
Estrogens / metabolism
Humans
Information Storage and Retrieval
Mice
Software*

Substances

Estrogens