Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing

Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.

Abstract

Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Blood Proteins / chemistry
  • Computer Simulation
  • Data Mining / methods*
  • Database Management Systems
  • Databases, Protein*
  • Fungal Proteins / chemistry
  • Humans
  • Peptide Fragments / chemistry*
  • Proteins / chemistry*
  • Proteomics / methods
  • Tandem Mass Spectrometry / methods*

Substances

  • Blood Proteins
  • Fungal Proteins
  • Peptide Fragments
  • Proteins