Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics

Wiebke Timm; Alexandra Scherbart; Sebastian Böcker; Oliver Kohlbacher; Tim W Nattkemper

doi:10.1186/1471-2105-9-443

Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics

BMC Bioinformatics. 2008 Oct 20:9:443. doi: 10.1186/1471-2105-9-443.

Authors

Wiebke Timm¹, Alexandra Scherbart, Sebastian Böcker, Oliver Kohlbacher, Tim W Nattkemper

Affiliation

¹ Applied Neuroinformatics Group, Bielefeld University, Germany. [email protected]

Abstract

Background: Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

Results: In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

Conclusion: The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence*
Linear Models
Neural Networks, Computer
Peptides / analysis*
Peptides / chemistry
Proteins / analysis*
Proteins / chemistry
Proteomics / methods*
Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*

Substances

Peptides
Proteins