Comparison of cardiovascular risk prediction models developed using machine learning based on data from a Sri Lankan cohort with World Health Organization risk charts for predicting cardiovascular risk among Sri Lankans: a cohort study

BMJ Open. 2025 Jan 15;15(1):e081434. doi: 10.1136/bmjopen-2023-081434.

Abstract

Introduction: Models derived from non-Sri Lankan cohorts are used for cardiovascular (CV) risk stratification of Sri Lankans.

Objective: To develop a CV risk prediction model using machine learning (ML) based on data from a Sri Lankan cohort followed up for 10 years, and to compare the predictions with WHO risk charts.

Design: Cohort study.

Setting: The Ragama Health Study (RHS), an ongoing, prospective, population-based cohort study of patients randomly selected from the Ragama Medical Office of Heath area, Sri Lanka, focusing on the epidemiology of non-communicable diseases, was used to develop the model. The external validation cohort included patients admitted to Colombo North Teaching Hospital (CNTH), a tertiary care hospital in Sri Lanka, from January 2019 through August 2020.

Participants: All RHS participants, aged 40-64 years in 2007, without cardiovascular disease (CVD) at baseline, who had complete data of 10-year outcome by 2017, were used for model development. Patients aged 40-74 years admitted to CNTH during the study period with incident CV events or a disease other than an acute CV event (CVE) with complete data for CVD risk calculation were used for external validation of the model.

Methods: Using the follow-up data of the cohort, we developed two ML models for predicting 10-year CV risk using six conventional CV risk variables (age, gender, smoking status, systolic blood pressure, history of diabetes, and total cholesterol level) and all available variables (n=75). The ML models were derived using classification algorithms of the supervised learning technique. We compared the predictive performance of our ML models with WHO risk charts (2019, Southeast Asia) using area under the receiver operating characteristic curves (AUC-ROC) and calibration plots. We validated the 6-variable model in an external hospital-based cohort.

Results: Of the 2596 participants in the baseline cohort, 179 incident CVEs were observed over 10 years. WHO risk charts predicted only 10 CVEs (AUC-ROC: 0.51, 95% CI 0.42 to 0.60), while the new 6-variable ML model predicted 125 CVEs (AUC-ROC: 0.72, 95% CI 0.66 to 0.78) and the 75-variable ML model predicted 124 CVEs (AUC-ROC: 0.74, 95% CI 0.68 to 0.80). Calibration results (Hosmer-Lemeshow test) for the 6-variable ML model and the WHO risk charts were χ2=12.85 (p=0.12) and χ2=15.58 (p=0.05), respectively. In the external validation cohort, the sensitivity, specificity, positive predictive value, negative predictive value, and calibration of the 6-variable ML model and the WHO risk charts, respectively, were: 70.3%, 94.9%, 87.3%, 86.6%, χ2=8.22, p=0.41 and 23.7%, 79.0%, 35.8%, 67.7%, χ2=81.94, p<0.0001.

Conclusions: ML-based models derived from a cohort of Sri Lankans improved the overall accuracy of CV-risk prediction compared with the WHO risk charts for this cohort of Southeast Asians.

Keywords: Cardiac Epidemiology; Preventive Medicine; Primary Prevention; Risk management.

Publication types

  • Comparative Study

MeSH terms

  • Adult
  • Aged
  • Cardiovascular Diseases* / epidemiology
  • Cohort Studies
  • Female
  • Heart Disease Risk Factors
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Prospective Studies
  • Risk Assessment / methods
  • Risk Factors
  • Sri Lanka / epidemiology
  • World Health Organization