Improve accuracy and sensibility in glycan structure prediction by matching glycan isotope abundance

Anal Chim Acta. 2012 Sep 19:743:80-9. doi: 10.1016/j.aca.2012.07.009. Epub 2012 Jul 16.

Abstract

Mass Spectrometry (MS) is a powerful technique for the determination of glycan structures and is capable of providing qualitative and quantitative information. Recent development in computational method offers an opportunity to use glycan structure databases and de novo algorithms for extracting valuable information from MS or MS/MS data. However, detecting low-intensity peaks that are buried in noisy data sets is still a challenge and an algorithm for accurate prediction and annotation of glycan structures from MS data is highly desirable. The present study describes a novel algorithm for glycan structure prediction by matching glycan isotope abundance (mGIA), which takes isotope masses, abundances, and spacing into account. We constructed a comprehensive database containing 808 glycan compositions and their corresponding isotope abundance. Unlike most previously reported methods, not only did we take into count the m/z values of the peaks but also their corresponding logarithmic Euclidean distance of the calculated and detected isotope vectors. Evaluation against a linear classifier, obtained by training mGIA algorithm with datasets of three different human tissue samples from Consortium for Functional Glycomics (CFG) in association with Support Vector Machine (SVM), was proposed to improve the accuracy of automatic glycan structure annotation. In addition, an effective data preprocessing procedure, including baseline subtraction, smoothing, peak centroiding and composition matching for extracting correct isotope profiles from MS data was incorporated. The algorithm was validated by analyzing the mouse kidney MS data from CFG, resulting in the identification of 6 more glycan compositions than the previous annotation and significant improvement of detection of weaker peaks compared with the algorithm previously reported.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Isotopes / chemistry
  • Kidney / chemistry
  • Mice
  • Pattern Recognition, Automated / methods*
  • Polysaccharides / chemistry*
  • Predictive Value of Tests
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization

Substances

  • Isotopes
  • Polysaccharides