Database searches of MS/MS spectra are the main approach to peptide/protein identification in proteomics. Since most database search engines only utilize a small portion of the original MS/MS signals for peptide detection, how to improve the quality of MS/MS signals is a primary concern for enhancement of the peptide/protein identification rate. A fundamental issue is that some noise MS signals, informative or uninformative, have to be filtered out prior to database searching. Herein, an integrative preprocessing algorithm was designed, termed pClean, which incorporates three modules to preprocess MS/MS spectra, such as the removal of isobaric-labeling related ions, the reduction in isotopic peaks, the deconvolution of ions with higher charges, and the clearance of uninformative MS/MS signals. In contrast to the currently available approaches to MS/MS data preprocessing, pClean enables treatment of MS/MS spectra with high mass accuracy and favors filtering for the labeling or nonlabeling of peptides. Data sets at various scales gained from mass spectrometers with high resolution were used to assess the quality of peptides identified after pClean treatment and to compare the pClean improvement with those of other software programs. On the basis of the analysis of peptides identified and the Mascot ion score, pClean was proven to be effective in the removal of mass spectral noise and the reduction of random matching. Compared with other software programs, pClean appeared to be beneficial in terms of preprocessing performances for the enhancement of confidence scores and the increase in peptides identified. pClean is available at https://github.com/AimeeD90/pClean_release .
Keywords: MS/MS; bioinformatics; database search; pClean; preprocessing; proteomics.