Motivation: Analysis of high-throughput proteomic/genomic data, in particular, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) data and microarray data, has led to a multitude of techniques aimed at identifying potential biomarkers. Most of the statistical techniques for comparing two groups are based on qualitative measures such as P-value. A quantitative way such as interval estimation for the contrasts of two groups is more appealing.
Results: We have devised a simultaneous confidence bands method capable of detecting potential biomarkers, while controlling for overall confidence coverage level, in high-dimensional datasets that discriminate two treatment groups using a permutation scheme. For example, for the SELDI-TOF MS data, we deal with the entire spectrum simultaneously and construct (1 - alpha) confidence bands for the mean differences between groups. Furthermore, peaks were identified based on the maximal differences between the groups as determined by the confidence bands. The analysis method herein described gives both qualitative (P-value) and quantitative data (magnitude of difference). The Clinical Proteomics Programs Databank's ovarian cancer dataset and data from in-house samples containing known spiked-in proteins were analyzed. We were able to identify potential biomarkers similar to those described in previous analysis of the ovarian cancer data, however, while these markers are highly significant between cancer and normal groups, our analysis indicated the absolute difference between the two groups was minimal. In addition, we found additional markers than those previously described with greater differences in average intensities. The proposed confidence bands method successfully detected the spiked-in peaks, as well as, secondary peaks generated by adducts and double-charged species. We also illustrate our method utilizing paired gene expression data from a prostate cancer microarray experiment by constructing confidence bands for the fold changes between cancer and normal samples.
Availability: R-package, 'seie.zip' (license: GNU GPL), is publiclly available at http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/