Improving large-scale proteomics by clustering of mass spectrometry data

Proteomics. 2004 Apr;4(4):950-60. doi: 10.1002/pmic.200300652.

Abstract

Tandem mass spectrometry (MS/MS), coupled with liquid chromatography (LC), is a powerful tool for the analysis and comparison of complex protein and peptide mixtures. However, the extremely large amounts of data that result from the process are very complex and difficult to analyze. We show how the clustering of similar spectra from multiple LC-MS/MS runs can help in data management and improve the analysis of complex peptide mixtures. The major effect of spectrum clustering is the reduction of the huge amounts of data to a manageable size. As a result, analysis time is shorter and more data can be stored for further analysis. Furthermore, spectrum quality improvement allows the identification of more peptides with greater confidence, the comparison of complex peptide mixtures is facilitated, and the entire proteomics project is presented in concise form. Pep-Miner is an advanced software tool that implements these clustering-based applications. It proved useful in several comparative proteomics projects involving lung cancer cells and various other cell types. In one of these projects, Pep-Miner reduced 517 000 spectra to 20 900 clusters and identified 2518 peptides derived from 830 proteins. Clustering and identification lasted less than two hours on an IBM Thinkpad T23 computer (laptop). Pep-Miner's unique properties make it a very useful tool for large-scale shotgun proteomics projects.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Chromatography, Liquid
  • Cluster Analysis
  • Computational Biology
  • Models, Statistical*
  • Peptides / analysis
  • Proteins / analysis
  • Proteome*
  • Software*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*

Substances

  • Peptides
  • Proteins
  • Proteome