Algebraic stability indicators for ranked lists in molecular profiling

Giuseppe Jurman; Stefano Merler; Annalisa Barla; Silvano Paoli; Antonio Galea; Cesare Furlanello

doi:10.1093/bioinformatics/btm550

Algebraic stability indicators for ranked lists in molecular profiling

Bioinformatics. 2008 Jan 15;24(2):258-64. doi: 10.1093/bioinformatics/btm550. Epub 2007 Nov 16.

Authors

Giuseppe Jurman¹, Stefano Merler, Annalisa Barla, Silvano Paoli, Antonio Galea, Cesare Furlanello

Affiliation

¹ FBK, via Sommarive 18, I-38100 Povo (Trento), Italy.

PMID: 18024475
DOI: 10.1093/bioinformatics/btm550

Abstract

Motivation: We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictor's type and help selecting solutions.

Results: A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearman's footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis.

Availability: Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Biomarkers, Tumor / analysis*
Data Interpretation, Statistical
Gene Expression Profiling / methods*
Humans
Neoplasm Proteins / analysis*
Neoplasms / metabolism*
Oligonucleotide Array Sequence Analysis / methods*
Reproducibility of Results
Sensitivity and Specificity

Substances

Biomarkers, Tumor
Neoplasm Proteins