COPS: A novel platform for multi-omic disease subtype discovery via robust multi-objective evaluation of clustering algorithms

PLoS Comput Biol. 2024 Aug 5;20(8):e1012275. doi: 10.1371/journal.pcbi.1012275. eCollection 2024 Aug.

Abstract

Recent research on multi-view clustering algorithms for complex disease subtyping often overlooks aspects like clustering stability and critical assessment of prognostic relevance. Furthermore, current frameworks do not allow for a comparison between data-driven and pathway-driven clustering, highlighting a significant gap in the methodology. We present the COPS R-package, tailored for robust evaluation of single and multi-omics clustering results. COPS features advanced methods, including similarity networks, kernel-based approaches, dimensionality reduction, and pathway knowledge integration. Some of these methods are not accessible through R, and some correspond to new approaches proposed with COPS. Our framework was rigorously applied to multi-omics data across seven cancer types, including breast, prostate, and lung, utilizing mRNA, CNV, miRNA, and DNA methylation data. Unlike previous studies, our approach contrasts data- and knowledge-driven multi-view clustering methods and incorporates cross-fold validation for robustness. Clustering outcomes were assessed using the ARI score, survival analysis via Cox regression models including relevant covariates, and the stability of the results. While survival analysis and gold-standard agreement are standard metrics, they vary considerably across methods and datasets. Therefore, it is essential to assess multi-view clustering methods using multiple criteria, from cluster stability to prognostic relevance, and to provide ways of comparing these metrics simultaneously to select the optimal approach for disease subtype discovery in novel datasets. Emphasizing multi-objective evaluation, we applied the Pareto efficiency concept to gauge the equilibrium of evaluation metrics in each cancer case-study. Affinity Network Fusion, Integrative Non-negative Matrix Factorization, and Multiple Kernel K-Means with linear or Pathway Induced Kernels were the most stable and effective in discerning groups with significantly different survival outcomes in several case studies.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Computational Biology* / methods
  • DNA Copy Number Variations / genetics
  • DNA Methylation / genetics
  • Female
  • Gene Expression Profiling / methods
  • Genomics / methods
  • Humans
  • Male
  • MicroRNAs / genetics
  • Multiomics
  • Neoplasms* / classification
  • Neoplasms* / genetics
  • Prognosis
  • Software
  • Survival Analysis

Substances

  • MicroRNAs

Grants and funding

This study was supported by the BIOMAP-IMI project (V.F.). BIOMAP has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement No. 821511. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. This study is also supported by the Academy of Finland (336275, 332510, and 358037, V.F.), the Jane and Aatos Erkko Foundation (210026, V.F.), Sigrid Jusélius Foundation (V.F.), and the Finnish Cultural Foundation (00230994, T.J.R.). V.F. receives a salary from the Academy of Finland, while T.J.R. is supported by the BIOMAP-IMI project and the Finnish Cultural Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.