Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing

You Li; Hao Chi; Le-Heng Wang; Hai-Peng Wang; Yan Fu; Zuo-Fei Yuan; Su-Jun Li; Yan-Sheng Liu; Rui-Xiang Sun; Rong Zeng; Si-Min He

doi:10.1002/rcm.4448

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing

Rapid Commun Mass Spectrom. 2010 Mar;24(6):807-14. doi: 10.1002/rcm.4448.

Authors

You Li¹, Hao Chi, Le-Heng Wang, Hai-Peng Wang, Yan Fu, Zuo-Fei Yuan, Su-Jun Li, Yan-Sheng Liu, Rui-Xiang Sun, Rong Zeng, Si-Min He

Affiliation

¹ Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

PMID: 20187083
DOI: 10.1002/rcm.4448

Abstract

Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Blood Proteins / chemistry
Computer Simulation
Data Mining / methods*
Database Management Systems
Databases, Protein*
Fungal Proteins / chemistry
Humans
Peptide Fragments / chemistry*
Proteins / chemistry*
Proteomics / methods
Tandem Mass Spectrometry / methods*

Substances

Blood Proteins
Fungal Proteins
Peptide Fragments
Proteins