Prediction model for ocular metastasis of breast cancer: machine learning model development and interpretation study

BMC Cancer. 2024 Nov 29;24(1):1472. doi: 10.1186/s12885-024-12928-w.

Abstract

Background: Breast cancer (BC) is caused by the uncontrolled proliferation of breast epithelial cells followed by malignant transformation, and it has the highest incidence among female malignant tumors. The metastasis of BC occurs through direct and lymphatic spread. Although ocular metastasis is relatively rare, it is a good indicator of a worse prognosis. We used machine learning (ML) to establish a model to analyze the risk factors of BC eye metastasis.

Methods: The clinical data of 2225 patients with BC from 2003 to 2019 were collected and randomly classified into the training and test sets using a ratio of 7:3. Based on the presence or absence of eye metastasis, the patients with BC were classified into the ocular metastasis (OM) and non-ocular metastasis (NOM) groups. Univariate and multivariate logistic regression analyses and least absolute shrinkage and selection operator (LASSO) were conducted. We used six ML algorithms to establish a predictive BC model and used 10-fold cross-validation for internal verification. The area under the receiver operating characteristic (ROC) curve was used to evaluate the predictive ability of the model. In addition, we established a web hazard calculator depending on the best-performing model to facilitate its clinical application. Shapley additive interpretation (SHAP) was used to determine the risk factors and the interpretability of the black box model.

Results: Univariate logistic regression analysis showed that histopathology (other types), axillary lymph node metastasis (ALNM) (> 4), Ca2+, total cholesterol (TC), low-density lipoprotein (LDL), apolipoprotein A (ApoA), carcinoembryonic antigen (CEA), carbohydrate antigen (CA) 125, CA153, CA199, alkaline phosphatase (ALP), and hemoglobin (Hb) were risk factors for BC eye metastasis. Multivariate logistic regression analysis showed that CA153, ApoA, and LDL were hazardous components for BC eye metastasis. LASSO showed that ALNM, LDL, CA125, Hb, ALP, and CA199 were the first six key variables that were useful for the diagnosis of ocular metastasis in breast cancer. Bootstrapped aggregation (BAG) demonstrated the discriminative ability (area under ROC curve [AUC] = 0.992, accuracy = 0.953, sensitivity = 0.987). Based on this, we applied the BAG machine learning model to build an online web computing system to help clinicians assist in determining the risk of BC eye metastasis. In addition, two typical cases are analyzed to determine the interpretability of the model.

Conclusion: We used ML to establish a risk prediction model for BC ocular metastasis, and BAG showed the greatest performance. The model can predict the risk of OM in patients with BC, facilitate early and timely diagnosis and treatment, and reduce the burden on society.

Keywords: Bootstrapped aggregation; Machine learning; Ocular metastases in breast cancer; Risk factors; Shapley additive interpretation.

MeSH terms

  • Adult
  • Aged
  • Biomarkers, Tumor / metabolism
  • Breast Neoplasms* / pathology
  • Eye Neoplasms* / pathology
  • Eye Neoplasms* / secondary
  • Female
  • Humans
  • Logistic Models
  • Lymphatic Metastasis / pathology
  • Machine Learning*
  • Middle Aged
  • Prognosis
  • ROC Curve
  • Retrospective Studies
  • Risk Factors

Substances

  • Biomarkers, Tumor