Predicting doxorubicin-induced cardiotoxicity in breast cancer: leveraging machine learning with synthetic data

Med Biol Eng Comput. 2025 Jan 20. doi: 10.1007/s11517-025-03289-y. Online ahead of print.

Abstract

Doxorubicin (DOXO) is a primary treatment for breast cancer but can cause cardiotoxicity in over 25% of patients within the first year post-chemotherapy. Recognizing at-risk patients before DOXO initiation offers pathways for alternative treatments or early protective actions. We analyzed data from 78 Brazilian breast cancer patients, with 34.6% developing cardiotoxicity within a year of their final DOXO dose. To address the limited sample size, we utilized the DAS (Data Augmentation and Smoothing) method, creating 4892 synthetic samples that exhibited high statistics fidelity to the original data. By integrating routine blood biomarkers (C-Reactive protein, total cholesterol, LDL-c, HDL-c, hematocrit, and hemoglobin) and two clinical measures (weighted smoking status and body mass index), our model achieved an AUROC of 0.85±0.10, a sensitivity of 0.89, and a specificity of 0.69, positioning it as a potential screening instrument. Notably, DAS outperformed the established methods, Adaptive Synthetic Sampling (ADASYN), Synthetic Minority Over-Sampling Technique (SMOTE), and Synthetic Data Vault (SDV), underscoring its promise for medical synthetic data generation and pioneering a cardiotoxicity prediction model specifically for DOXO.

Keywords: Blood; Cardiotoxicity; Data augmentation; Doxorubicin; Machine learning; Synthetic data.