Background: Low-density lipoprotein-cholesterol (LDL-C) is used as a threshold and target for treating dyslipidemia. Although the Friedewald equation is widely used to estimate LDL-C, it has been known to be inaccurate in the case of high triglycerides (TG) or non-fasting states. We aimed to propose a novel method to estimate LDL-C using machine learning.
Methods: Using a large, single-center electronic health record database, we derived a ML algorithm to estimate LDL-C from standard lipid profiles. From 1,029,572 cases with both standard lipid profiles (total cholesterol, high-density lipoprotein-cholesterol, and TG) and direct LDL-C measurements, 823,657 tests were used to derive LDL-C estimation models. Patient characteristics such as sex, age, height, weight, and other laboratory values were additionally used to create separate data sets and algorithms.
Results: Machine learning with gradient boosting (LDL-CX) and neural network (LDL-CN) showed better correlation with directly measured LDL-C, compared with conventional methods (r = 0.9662, 0.9668, 0.9563, 0.9585; for LDL-CX, LDL-CN, Friedewald [LDL-CF], and Martin [LDL-CM] equations, respectively). The overall bias of LDL-CX (-0.27 mg/dL, 95% CI -0.30 to -0.23) and LDL-CN (-0.01 mg/dL, 95% CI -0.04-0.03) were significantly smaller compared with both LDL-CF (-3.80 mg/dL, 95% CI -3.80 to -3.60) or LDL-CM (-2.00 mg/dL, 95% CI -2.00 to -1.94), especially at high TG levels.
Conclusions: Machine learning algorithms were superior in estimating LDL-C compared with the conventional Friedewald or the more contemporary Martin equations. Through external validation and modification, machine learning could be incorporated into electronic health records to substitute LDL-C estimation.
Keywords: Cost-effectiveness; Hypercholesterolemia; Low-density lipoprotein cholesterol; Machine-learning; Triglycerides.
Copyright © 2022 Elsevier B.V. All rights reserved.