Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival

Richard Li; Ashwin Shinde; An Liu; Scott Glaser; Yung Lyou; Bertram Yuh; Jeffrey Wong; Arya Amini

doi:10.1200/CCI.20.00002

Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival

JCO Clin Cancer Inform. 2020 Jul:4:637-646. doi: 10.1200/CCI.20.00002.

Authors

Richard Li¹, Ashwin Shinde¹, An Liu¹, Scott Glaser¹, Yung Lyou², Bertram Yuh³, Jeffrey Wong¹, Arya Amini¹

Affiliations

¹ Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA.
² Department of Medical Oncology, City of Hope Medical Center, Duarte, CA.
³ Department of Urology, City of Hope Medical Center, Duarte, CA.

PMID: 32673068
DOI: 10.1200/CCI.20.00002

Abstract

Purpose: Shapley additive explanation (SHAP) values represent a unified approach to interpreting predictions made by complex machine learning (ML) models, with superior consistency and accuracy compared with prior methods. We describe a novel application of SHAP values to the prediction of mortality risk in prostate cancer.

Methods: Patients with nonmetastatic, node-negative prostate cancer, diagnosed between 2004 and 2015, were identified using the National Cancer Database. Model features were specified a priori: age, prostate-specific antigen (PSA), Gleason score, percent positive cores (PPC), comorbidity score, and clinical T stage. We trained a gradient-boosted tree model and applied SHAP values to model predictions. Open-source libraries in Python 3.7 were used for all analyses.

Results: We identified 372,808 patients meeting the inclusion criteria. When analyzing the interaction between PSA and Gleason score, we demonstrated consistency with the literature using the example of low-PSA, high-Gleason prostate cancer, recently identified as a unique entity with a poor prognosis. When analyzing the PPC-Gleason score interaction, we identified a novel finding of stronger interaction effects in patients with Gleason ≥ 8 disease compared with Gleason 6-7 disease, particularly with PPC ≥ 50%. Subsequent confirmatory linear analyses supported this finding: 5-year overall survival in Gleason ≥ 8 patients was 87.7% with PPC < 50% versus 77.2% with PPC ≥ 50% (P < .001), compared with 89.1% versus 86.0% in Gleason 7 patients (P < .001), with a significant interaction term between PPC ≥ 50% and Gleason ≥ 8 (P < .001).

Conclusion: We describe a novel application of SHAP values for modeling and visualizing nonlinear interaction effects in prostate cancer. This ML-based approach is a promising technique with the potential to meaningfully improve risk stratification and staging systems.

MeSH terms

Humans
Machine Learning
Male
Neoplasm Grading
Prostate-Specific Antigen
Prostatic Neoplasms*

Substances

Prostate-Specific Antigen