A time-dependent explainable radiomic analysis from the multi-omic cohort of CPTAC-Pancreatic Ductal Adenocarcinoma

Comput Methods Programs Biomed. 2024 Dec:257:108408. doi: 10.1016/j.cmpb.2024.108408. Epub 2024 Sep 7.

Abstract

Background and objective: In Pancreatic Ductal Adenocarcinoma (PDA), multi-omic models are emerging to answer unmet clinical needs to derive novel quantitative prognostic factors. We realized a pipeline that relies on survival machine-learning (SML) classifiers and explainability based on patients' follow-up (FU) to stratify prognosis from the public-available multi-omic datasets of the CPTAC-PDA project.

Materials and methods: Analyzed datasets included tumor-annotated radiologic images, clinical, and mutational data. A feature selection was based on univariate (UV) and multivariate (MV) survival analyses according to Overall Survival (OS) and recurrence (REC). In this study, we considered seven multi-omic datasets and compared four SML classifiers: Cox, survival random forest, generalized boosted, and support vector machines (SVM). For each classifier, we assessed the concordance (C) index on the validation set. The best classifiers for the validation set on both OS and REC underwent explainability analyses using SurvSHAP(t), which extends SHapley Additive exPlanations (SHAP).

Results: According to OS, after UV and MV analyses we selected 18/37 and 10/37 multi-omic features, respectively. According to REC, based on UV and MV analyses we selected 10/35 and 5/35 determinants, respectively. Generally, SML classifiers including radiomics outperformed those modelled on clinical or mutational predictors. For OS, the Cox model encompassing radiomic, clinical, and mutational features reached 75 % of C index, outperforming other classifiers. On the other hand, for REC, the SVM model including only radiomics emerged as the best-performing, with 68 % of C index. For OS, SurvSHAP(t) identified the first order Median Gray Level (GL) intensities, the gender, the tumor grade, the Joint Energy GL Co-occurrence Matrix (GLCM), and the GLCM Informational Measures of Correlations of type 1 as the most important features. For REC, the first order Median GL intensities, the GL size zone matrix Small Area Low GL Emphasis, and first order variance of GL intensities emerged as the most discriminative.

Conclusions: In this work, radiomics showed the potential for improving patients' risk stratification in PDA. Furthermore, a deeper understanding of how radiomics can contribute to prognosis in PDA was achieved with a time-dependent explainability of the top multi-omic predictors.

Keywords: Explainability; Machine learning; Pancreatic ductal adenocarcinoma; Radiomics; Survival analysis.

MeSH terms

  • Aged
  • Algorithms
  • Carcinoma, Pancreatic Ductal* / diagnostic imaging
  • Carcinoma, Pancreatic Ductal* / genetics
  • Cohort Studies
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Multiomics
  • Pancreatic Neoplasms* / diagnostic imaging
  • Pancreatic Neoplasms* / genetics
  • Prognosis
  • Proportional Hazards Models
  • Radiomics
  • Support Vector Machine*
  • Survival Analysis
  • Time Factors