Stable biomarker discovery in multi-omics data via canonical correlation analysis

Taneli Pusa; Juho Rousu

doi:10.1371/journal.pone.0309921

Stable biomarker discovery in multi-omics data via canonical correlation analysis

PLoS One. 2024 Sep 9;19(9):e0309921. doi: 10.1371/journal.pone.0309921. eCollection 2024.

Authors

Taneli Pusa¹, Juho Rousu¹

Affiliation

¹ Department of Computer Science, Aalto University, Espoo, Finland.

Abstract

Multi-omics analysis offers a promising avenue to a better understanding of complex biological phenomena. In particular, untangling the pathophysiology of multifactorial health conditions such as the inflammatory bowel disease (IBD) could benefit from simultaneous consideration of several omics levels. However, taking full advantage of multi-omics data requires the adoption of suitable new tools. Multi-view learning, a machine learning technique that natively joins together heterogeneous data, is a natural source for such methods. Here we present a new approach to variable selection in unsupervised multi-view learning by applying stability selection to canonical correlation analysis (CCA). We apply our method, StabilityCCA, to simulated and real multi-omics data, and demonstrate its ability to find relevant variables and improve the stability of variable selection. In a case study on an IBD microbiome data set, we link together metagenomics and metabolomics, revealing a connection between their joint structure and the disease, and identifying potential biomarkers. Our results showcase the usefulness of multi-view learning in multi-omics analysis and demonstrate StabilityCCA as a powerful tool for biomarker discovery.

Copyright: © 2024 Pusa, Rousu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Biomarkers* / metabolism
Gastrointestinal Microbiome
Humans
Inflammatory Bowel Diseases* / metabolism
Machine Learning
Metabolomics* / methods
Metagenomics / methods
Multiomics

Substances

Biomarkers

Grants and funding

The authors wish to acknowledge the financial support by Academy of Finland through the grants 334790 (MAGITICS), 339421 (MASF) and 345802 (AIB), as well as the Global Programme by Finnish Ministry of Education and Culture. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.