Centralized data analysis of a large interlaboratory proteomics project: a feasibility study

Proteomics. 2005 Aug;5(13):3491-6. doi: 10.1002/pmic.200401336.

Abstract

The human Plasma Proteome Project (PPP) is a large-scale collaboration between many laboratories. One of the most demanding tasks in the PPP involved the analysis of very large amounts of raw MS/MS data produced by the participants. The main approach for managing this task was letting the participants analyze their own data and submit the results to the central PPP repository as lists of identified proteins and peptides. To complement this distributed approach, we also performed centralized analysis of the raw MS/MS data provided by the participants. Due to the data redundancy inherent in such a project, centralized analysis has the potential to reduce the computational effort by reducing redundancy before the analysis. Centralized analysis can also unify the process and take advantage of data sharing among laboratories to improve protein identification and validation. The process we employed included removing low-quality spectra, clustering spectra by mutual similarity, and applying uniform peptide and protein identification procedures. To demonstrate the process, we analyzed 5.28 million MS/MS spectra derived by eight laboratories from tryptic peptides of serum and plasma proteins.

MeSH terms

  • Blood Proteins / chemistry*
  • Cluster Analysis
  • Computational Biology / methods
  • Databases, Protein
  • Feasibility Studies
  • Humans
  • Mass Spectrometry / methods*
  • Peptides / chemistry
  • Pilot Projects
  • Proteins / chemistry
  • Proteome
  • Proteomics / methods*
  • Statistics as Topic / methods*
  • Trypsin / pharmacology

Substances

  • Blood Proteins
  • Peptides
  • Proteins
  • Proteome
  • Trypsin