Statistics and Machine Learning in Mass Spectrometry-Based Metabolomics Analysis

Methods Mol Biol. 2023:2629:247-269. doi: 10.1007/978-1-0716-2986-4_12.

Abstract

In this chapter, we review the cutting-edge statistical and machine learning methods for missing value imputation, normalization, and downstream analyses in mass spectrometry metabolomics studies, with illustration by example datasets. The missing peak recovery includes simple imputation by zero or limit of detection, regression-based or distribution-based imputation, and prediction by random forest. The batch effect can be removed by data-driven methods, internal standard-based, and quality control sample-based normalization. We also summarize different types of statistical analysis for metabolomics and clinical outcomes, such as inference on metabolic biomarkers, clustering of metabolomic profiles, metabolite module building, and integrative analysis with transcriptome.

Keywords: Imputation; Integrative analysis; Mass spectrometry; Metabolomics; Normalization; Statistical and machine learning.

MeSH terms

  • Cluster Analysis
  • Mass Spectrometry / methods
  • Metabolomics* / methods
  • Quality Control