Choosing an Optimal Database for Protein Identification from Tandem Mass Spectrometry Data

Methods Mol Biol. 2017:1549:17-29. doi: 10.1007/978-1-4939-6740-7_3.

Abstract

Database searching is the preferred method for protein identification from digital spectra of mass to charge ratios (m/z) detected for protein samples through mass spectrometers. The search database is one of the major influencing factors in discovering proteins present in the sample and thus in deriving biological conclusions. In most cases the choice of search database is arbitrary. Here we describe common search databases used in proteomic studies and their impact on final list of identified proteins. We also elaborate upon factors like composition and size of the search database that can influence the protein identification process. In conclusion, we suggest that choice of the database depends on the type of inferences to be derived from proteomics data. However, making additional efforts to build a compact and concise database for a targeted question should generally be rewarding in achieving confident protein identifications.

Keywords: Database size; Peptide identification; Proteogenomics; Shotgun proteomics; neXtProt.

MeSH terms

  • Computational Biology / methods
  • Databases, Protein*
  • Genomics / methods
  • Molecular Sequence Annotation
  • Peptides
  • Proteins* / genetics
  • Proteins* / metabolism
  • Proteomics / methods*
  • Search Engine
  • Sensitivity and Specificity
  • Tandem Mass Spectrometry* / methods
  • Web Browser
  • Workflow

Substances

  • Peptides
  • Proteins