Motivation: Due to the large number of peaks in mass spectra of low-molecular-weight (LMW) enriched sera, a systematic method is needed to select a parsimonious set of peaks to facilitate biomarker identification. We present computational methods for matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) spectral data preprocessing and peak selection. In particular, we propose a novel method that combines ant colony optimization (ACO) with support vector machines (SVM) to select a small set of useful peaks.
Results: The proposed hybrid ACO-SVM algorithm selected a panel of eight peaks out of 228 candidate peaks from MALDI-TOF spectra of LMW enriched sera. An SVM classifier built with these peaks achieved 94% sensitivity and 100% specificity in distinguishing hepatocellular carcinoma from cirrhosis in a blind validation set of 69 samples. Area under the receiver operating characteristic (ROC) curve was 0.996. The classification capability of these peaks is compared with those selected by the SVM-recursive feature elimination method.
Availability: Supplementary material and MATLAB scripts to implement the methods described in this article are available at http://microarray.georgetown.edu/web/files/bioinf.htm.
Supplementary information: Supplementary data are available at Bioinformatics online.