Machine learning survival models trained on clinical data to identify high risk patients with hormone responsive HER2 negative breast cancer

Sci Rep. 2023 May 26;13(1):8575. doi: 10.1038/s41598-023-35344-9.

Abstract

For endocrine-positive Her2 negative breast cancer patients at an early stage, the benefit of adding chemotherapy to adjuvant endocrine therapy is not still confirmed. Several genomic tests are available on the market but are very expensive. Therefore, there is the urgent need to explore novel reliable and less expensive prognostic tools in this setting. In this paper, we shown a machine learning survival model to estimate Invasive Disease-Free Events trained on clinical and histological data commonly collected in clinical practice. We collected clinical and cytohistological outcomes of 145 patients referred to Istituto Tumori "Giovanni Paolo II". Three machine learning survival models are compared with the Cox proportional hazards regression according to time-dependent performance metrics evaluated in cross-validation. The c-index at 10 years obtained by random survival forest, gradient boosting, and component-wise gradient boosting is stabled with or without feature selection at approximately 0.68 in average respect to 0.57 obtained to Cox model. Moreover, machine learning survival models have accurately discriminated low- and high-risk patients, and so a large group which can be spared additional chemotherapy to hormone therapy. The preliminary results obtained by including only clinical determinants are encouraging. The integrated use of data already collected in clinical practice for routine diagnostic investigations, if properly analyzed, can reduce time and costs of the genomic tests.

MeSH terms

  • Breast Neoplasms* / drug therapy
  • Breast Neoplasms* / genetics
  • Combined Modality Therapy
  • Female
  • Hormones
  • Humans
  • Machine Learning
  • Prognosis
  • Proportional Hazards Models
  • Receptor, ErbB-2 / genetics

Substances

  • Hormones
  • Receptor, ErbB-2