Purpose: Dyslipidemia poses a significant risk for the progression to cardiovascular diseases. Despite the identification of numerous risk factors and the proposal of various risk scales, there is still an urgent need for effective predictive models for the onset of cardiovascular diseases in the hyperlipidemic population, which are essential for the prevention of CVD.
Methods: We carried out a retrospective cohort study with 23,548 hyperlipidemia patients in Shenzhen Health Information Big Data Platform, including 11,723 CVD onset cases in a 3-year follow-up. The population was randomly divided into 70% as an independent training dataset and remaining 30% as test set. Four distinct machine-learning algorithms were implemented on the training dataset with the aim of developing highly accurate predictive models, and their performance was subsequently benchmarked against conventional risk assessment scales. An ablation study was also carried out to analyze the impact of individual risk factors to model performance.
Results: The non-linear algorithm, LightGBM, excelled in forecasting the incidence of cardiovascular disease within 3 years, achieving an area under the 'receiver operating characteristic curve' (AUROC) of 0.883. This performance surpassed that of the conventional logistic regression model, which had an AUROC of 0.725, on identical datasets. Concurrently, in direct comparative analyses, machine-learning approaches have notably outperformed the three traditional risk assessment methods within their respective applicable populations. These include the Framingham cardiovascular disease risk score, 2019 ESC/EAS guidelines for the management of dyslipidemia and the 2016 Chinese recommendations for the management of dyslipidemia in adults. Further analysis of risk factors showed that the variability of blood lipid levels and remnant cholesterol played an important role in indicating an increased risk of CVD.
Conclusions: We have shown that the application of machine-learning techniques significantly enhances the precision of cardiovascular risk forecasting among hyperlipidemic patients, addressing the critical issue of disease prediction's heterogeneity and non-linearity. Furthermore, some recently-suggested biomarkers, including blood lipid variability and remnant cholesterol are also important predictors of cardiovascular events, suggesting the importance of continuous lipid monitoring and healthcare profiling through big data platforms.
Keywords: Blood lipid variability; Cardiovascular disease; Hyperlipidemia; Prediction model.
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.