Motivation: There is a well-recognized potential of protein expression profiling using the surface-enhanced laser desorption and ionization technology for discovering biomarkers that can be applied in clinical diagnosis, prognosis and therapy prediction. The pre-processing of the raw data, however, is still problematic.
Methods: We focus on the peak detection step, where the standard method is marked by poor specificity. Currently, scientists need to inspect individual spectra visually and laboriously in order to verify that spectral peaks identified by the standard method are real. Motivated by this multi-spectral process, we investigate an analytical approach-called RS for 'regions of significance'-that reduces the data to a single spectrum of F-statistics capturing significant variability between spectra. To account for multiple testing, we use a false discovery rate criterion for identifying potentially interesting proteins.
Results: We show that RS has better operating characteristics than several existing methods and demonstrate routine applications on a number of large datasets.