Stable biomarker discovery in multi-omics data via canonical correlation analysis

PLoS One. 2024 Sep 9;19(9):e0309921. doi: 10.1371/journal.pone.0309921. eCollection 2024.

Abstract

Multi-omics analysis offers a promising avenue to a better understanding of complex biological phenomena. In particular, untangling the pathophysiology of multifactorial health conditions such as the inflammatory bowel disease (IBD) could benefit from simultaneous consideration of several omics levels. However, taking full advantage of multi-omics data requires the adoption of suitable new tools. Multi-view learning, a machine learning technique that natively joins together heterogeneous data, is a natural source for such methods. Here we present a new approach to variable selection in unsupervised multi-view learning by applying stability selection to canonical correlation analysis (CCA). We apply our method, StabilityCCA, to simulated and real multi-omics data, and demonstrate its ability to find relevant variables and improve the stability of variable selection. In a case study on an IBD microbiome data set, we link together metagenomics and metabolomics, revealing a connection between their joint structure and the disease, and identifying potential biomarkers. Our results showcase the usefulness of multi-view learning in multi-omics analysis and demonstrate StabilityCCA as a powerful tool for biomarker discovery.

MeSH terms

  • Biomarkers* / metabolism
  • Gastrointestinal Microbiome
  • Humans
  • Inflammatory Bowel Diseases* / metabolism
  • Machine Learning
  • Metabolomics* / methods
  • Metagenomics / methods
  • Multiomics

Substances

  • Biomarkers

Grants and funding

The authors wish to acknowledge the financial support by Academy of Finland through the grants 334790 (MAGITICS), 339421 (MASF) and 345802 (AIB), as well as the Global Programme by Finnish Ministry of Education and Culture. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.