Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum

Proteomics. 2003 Sep;3(9):1710-5. doi: 10.1002/pmic.200300516.

Abstract

We bring a "spectrum" of classical data mining and statistical analysis methods to bear on discrimination of two groups of spectra from 24 diseased and 17 normal patients. Our primary goal is to accurately estimate the generalizability of this small dataset. After an aggressive preprocessing step that reduces consideration to only 55 peaks, we conduct over 35 out-of-sample cross-validation simulations of logistic regression, binary decision trees, and linear discriminant analysis. Misclassification rates grow worse as the size of the holdout sample increases, with many exceeding 30 percent. The ability to generalize is clearly tempered by the statistical, instrumentation, and biophysical characteristics of the study.

Publication types

  • Evaluation Study

MeSH terms

  • Biomarkers / blood
  • Biomarkers / chemistry
  • Blood Proteins / analysis*
  • Blood Proteins / chemistry
  • Computational Biology / methods
  • Databases, Protein
  • Humans
  • Lung Neoplasms / blood
  • Lung Neoplasms / diagnosis*
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / statistics & numerical data*

Substances

  • Biomarkers
  • Blood Proteins