Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning

Sang Won Park; Ye-Lin Park; Eun-Gyeong Lee; Heejung Chae; Phillip Park; Dong-Woo Choi; Yeon Ho Choi; Juyeon Hwang; Seohyun Ahn; Keunkyun Kim; Woo Jin Kim; Sun-Young Kong; So-Youn Jung; Hyun-Jin Kim

doi:10.3390/cancers16223799

Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning

Cancers (Basel). 2024 Nov 12;16(22):3799. doi: 10.3390/cancers16223799.

Authors

Sang Won Park^{1

2}, Ye-Lin Park³, Eun-Gyeong Lee⁴, Heejung Chae^{3

5}, Phillip Park³, Dong-Woo Choi³, Yeon Ho Choi³, Juyeon Hwang³, Seohyun Ahn³, Keunkyun Kim³, Woo Jin Kim^{1

6

7}, Sun-Young Kong^{8

9}, So-Youn Jung⁴, Hyun-Jin Kim³

Affiliations

¹ Department of Medical Informatics, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea.
² Institute of Medical Science, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea.
³ Cancer Data Center, National Cancer Control Institute, National Cancer Center, Goyang 10408, Republic of Korea.
⁴ Department of Surgery, Center of Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea.
⁵ Department of Medical Oncology, Center for Breast Cancer, National Cancer Center, Goyang 10408, Republic of Korea.
⁶ Department of Internal Medicine, Kangwon National University Hospital, Chuncheon 24289, Republic of Korea.
⁷ Department of Internal Medicine, School of Medicine, Kangwon National University, Chuncheon 24341, Republic of Korea.
⁸ Targeted Therapy Branch, Research Institute, National Cancer Center, Goyang 10408, Republic of Korea.
⁹ Department of Laboratory Medicine, Hospital, National Cancer Center, Goyang 10408, Republic of Korea.

Abstract

Background/Objectives: Breast cancer is the most common cancer in women worldwide, requiring strategic efforts to reduce its mortality. This study aimed to develop a predictive classification model for breast cancer mortality using real-world data, including various clinical features. Methods: A total of 11,286 patients with breast cancer from the National Cancer Center were included in this study. The mortality rate of the total sample was approximately 6.2%. Propensity score matching was used to reduce bias. Several machine learning models, including extreme gradient boosting, were applied to 31 clinical features. To enhance model interpretability, we used the SHapley Additive exPlanations method. ML analyses were also performed on the samples, excluding patients who developed other cancers after breast cancer. Results: Among the ML models, the XGB model exhibited the highest discriminatory power, with an area under the curve of 0.8722 and a specificity of 0.9472. Key predictors of the mortality classification model included occurrence in other organs, age at diagnosis, N stage, T stage, curative radiation treatment, and Ki-67(%). Even after excluding patients who developed other cancers after breast cancer, the XGB model remained the best-performing, with an AUC of 0.8518 and a specificity of 0.9766. Additionally, the top predictors from SHAP were similar to the results for the overall sample. Conclusions: Our models provided excellent predictions of breast cancer mortality using real-world data from South Korea. Explainable artificial intelligence, such as SHAP, validated the clinical applicability and interpretability of these models.

Keywords: artificial intelligence; breast cancer; explainable artificial intelligence; machine learning; mortality.

Grants and funding

NCC-2210542-3/National Cancer Center/Republic of Korea