A comprehensive analysis about the influence of low-level preprocessing techniques on mass spectrometry data for sample classification

Int J Data Min Bioinform. 2014;10(4):455-73. doi: 10.1504/ijdmb.2014.064897.

Abstract

Matrix-Assisted Laser Desorption Ionisation Time-of-Flight (MALDI-TOF) is one of the high-throughput mass spectrometry technologies able to produce data requiring an extensive preprocessing before subsequent analyses. In this context, several low-level preprocessing techniques have been successfully developed for different tasks, including baseline correction, smoothing, normalisation, peak detection and peak alignment. In this work, we present a systematic comparison of different software packages aiding in the compulsory preprocessing of MALDI-TOF data. In order to guarantee the validity of our study, we test multiple configurations of each preprocessing technique that are subsequently used to train a set of classifiers whose performance (kappa and accuracy) provide us accurate information for the final comparison. Results from experiments show the real impact of preprocessing techniques on classification, evidencing that MassSpecWavelet provides the best performance and Support Vector Machines (SVM) are one of the most accurate classifiers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Analysis of Variance
  • Biomarkers / analysis
  • Computational Biology / methods
  • Data Interpretation, Statistical
  • Humans
  • Programming Languages
  • Reproducibility of Results
  • Signal Processing, Computer-Assisted*
  • Software
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*
  • Support Vector Machine

Substances

  • Biomarkers