Fully Automated Unconstrained Analysis of High-Resolution Mass Spectrometry Data with Machine Learning

J Am Chem Soc. 2022 Aug 17;144(32):14590-14606. doi: 10.1021/jacs.2c03631. Epub 2022 Aug 8.

Abstract

Mass spectrometry (MS) is a convenient, highly sensitive, and reliable method for the analysis of complex mixtures, which is vital for materials science, life sciences fields such as metabolomics and proteomics, and mechanistic research in chemistry. Although it is one of the most powerful methods for individual compound detection, complete signal assignment in complex mixtures is still a great challenge. The unconstrained formula-generating algorithm, covering the entire spectra and revealing components, is a "dream tool" for researchers. We present the framework for efficient MS data interpretation, describing a novel approach for detailed analysis based on deisotoping performed by gradient-boosted decision trees and a neural network that generates molecular formulas from the fine isotopic structure, approaching the long-standing inverse spectral problem. The methods were successfully tested on three examples: fragment ion analysis in protein sequencing for proteomics, analysis of the natural samples for life sciences, and study of the cross-coupling catalytic system for chemistry.

MeSH terms

  • Algorithms
  • Complex Mixtures
  • Machine Learning
  • Mass Spectrometry / methods
  • Metabolomics* / methods
  • Proteomics*

Substances

  • Complex Mixtures