A pathway-based computational framework for identification of a new modal of multi-omics biomarkers and its application in esophageal cancer

Comput Methods Programs Biomed. 2024 Apr:247:108077. doi: 10.1016/j.cmpb.2024.108077. Epub 2024 Feb 12.

Abstract

Background: The pathway-based strategy has been recently proposed for identifying biomarkers with the advantages of higher biological interpretability and cross-data robustness than the conventional gene-based strategy. However, its utility in clinical applications has been limited due to the high computational complexity and ill-defined performance.

Objective: The current study presents a machine learning-based computational framework using multi-omics data for identifying a new modal of biomarkers, called pathway-derived core biomarkers, which have the advantages of both gene-based and pathway-based biomarkers.

Methods: Machine-learning methods and gene-pathway network were integrated to select the pathway-derived core biomarkers. Multiple machine-learning algorithms were used to construct and validate the diagnostic models of the biomarkers based on more than 1400 multi-omics clinical samples of esophageal squamous cell carcinoma (ESCC).

Results: The results showed that the classifier models based on the new modal biomarkers achieved superior performance in the training datasets with an average AUC/accuracy of 0.98/0.95 and 0.89/0.81 for mRNAs and miRNA, respectively, higher than the currently known classifier models based on the conventional gene-based strategy and pathway-based strategy. In the testing cohorts, the AUC/accuracy increased by 6.1 %/7.3 % than the models based on the native gene-based biomarkers. The improved performance was further confirmed in independent validation cohorts. Specifically, the sensitivity/specificity increased by ∼3 % and the variance significantly decreased by ∼69 % compared with that of the native gene-based biomarkers. Importantly, the pathway-derived core biomarkers also recovered 45 % more previously reported biomarkers than the gene-based biomarkers and are more functionally relevant to the ESCC etiology (involved in 14 versus 7 pathways related with ESCC or other cancer), highlighting the cross-data robustness of this new modal of biomarkers via enhanced functional relevance.

Conclusions: The results demonstrated that the new modal of biomarkers not only have improved predicting performance and robustness, but also exhibit higher functional interpretability thus leading to the potential application in cancer diagnosis.

Keywords: Esophageal carcinoma; Machine learning; Multi-omics biomarkers; Pathway.

MeSH terms

  • Biomarkers
  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • Esophageal Neoplasms* / diagnosis
  • Esophageal Neoplasms* / genetics
  • Esophageal Neoplasms* / pathology
  • Esophageal Squamous Cell Carcinoma* / diagnosis
  • Esophageal Squamous Cell Carcinoma* / genetics
  • Esophageal Squamous Cell Carcinoma* / pathology
  • Humans
  • Multiomics
  • Sensitivity and Specificity

Substances

  • Biomarkers
  • Biomarkers, Tumor