MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines

J Proteome Res. 2011 Jul 1;10(7):2949-58. doi: 10.1021/pr2002116. Epub 2011 Apr 29.

Abstract

Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Databases, Protein
  • Escherichia coli
  • Escherichia coli Proteins / analysis
  • Escherichia coli Proteins / chemistry
  • Humans
  • Models, Statistical*
  • Peptides / analysis*
  • Peptides / chemistry
  • Probability*
  • Proteomics / methods*
  • Proteomics / statistics & numerical data
  • Research Design / statistics & numerical data
  • Saccharomyces cerevisiae
  • Saccharomyces cerevisiae Proteins / analysis
  • Saccharomyces cerevisiae Proteins / chemistry
  • Search Engine / methods*
  • Search Engine / statistics & numerical data
  • Software*
  • Tandem Mass Spectrometry / methods

Substances

  • Escherichia coli Proteins
  • Peptides
  • Saccharomyces cerevisiae Proteins