Development and validation of an interpretable machine learning model for predicting left atrial thrombus or spontaneous echo contrast in non-valvular atrial fibrillation patients

PLoS One. 2025 Jan 16;20(1):e0313562. doi: 10.1371/journal.pone.0313562. eCollection 2025.

Abstract

Purpose: Left atrial thrombus or spontaneous echo contrast (LAT/SEC) are widely recognized as significant contributors to cardiogenic embolism in non-valvular atrial fibrillation (NVAF). This study aimed to construct and validate an interpretable predictive model of LAT/SEC risk in NVAF patients using machine learning (ML) methods.

Methods: Electronic medical records (EMR) data of consecutive NVAF patients scheduled for catheter ablation at the First Hospital of Jilin University from October 1, 2022, to February 1, 2024, were analyzed. A retrospective study of 1,222 NVAF patients was conducted. Nine ML algorithms combined with demographic, clinical, and laboratory data were applied to develop prediction models for LAT/SEC in NVAF patients. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression. Multiple ML classification models were integrated to identify the optimal model, and Shapley Additive exPlanations (SHAP) interpretation was utilized for personalized risk assessment. Diagnostic performances of the optimal model and the CHA2DS2-VASc scoring system for predicting LAT/SEC risk in NVAF were compared.

Results: Among 1,078 patients included, the incidence of LAT/SEC was 10.02%. Six independent predictors, including age, non-paroxysmal AF, diabetes, ischemic stroke or thromboembolism (IS/TE), hyperuricemia, and left atrial diameter (LAD), were identified as the most valuable features. The logistic classification model exhibited the best performance with an area under the receiver operating characteristic curve (AUC) of 0.850, accuracy of 0.812, sensitivity of 0.818, and specificity of 0.780 in the test set. SHAP analysis revealed the contribution of explanatory variables to the model and their relationship with LAT/SEC occurrence. The logistic regression model significantly outperformed the CHA2DS2-VASc scoring system, with AUCs of 0.831 and 0.650, respectively (Z = 7.175, P < 0.001).

Conclusions: ML proves to be a reliable tool for predicting LAT/SEC risk in NVAF patients. The constructed logistic regression model, along with SHAP interpretation, may serve as a clinically useful tool for identifying high-risk NVAF patients. This enables targeted diagnostic evaluations and the development of personalized treatment strategies based on the findings.

Publication types

  • Validation Study

MeSH terms

  • Aged
  • Atrial Fibrillation* / diagnosis
  • Atrial Fibrillation* / diagnostic imaging
  • Echocardiography / methods
  • Female
  • Heart Atria* / diagnostic imaging
  • Heart Atria* / pathology
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Retrospective Studies
  • Risk Assessment / methods
  • Risk Factors
  • Thrombosis* / diagnostic imaging