Coronary artery disease (CAD) is a leading cause of mortality in the world. It is important to be able to proactively assess the risk of the disease, using novel biomarkers like cytokines that are indicators of inflammation in addition to traditional predictors of risk. Atherosclerosis, the primary cause of CAD, is an inflammatory disease involving cytokines. Identifying which cytokines are specifically altered can advance diagnosis and personalized treatment. Emerging research demonstrates that cytokines are transported on high density lipoproteins (HDL). Therefore, it is important to explore the roles of HDL-associated cytokines in vascular inflammation. Machine Learning (ML) algorithms are enhancing pioneering research from the standpoint of precision medicine. This technology can materially enable the translation of scientific research to clinical practice. In this study we implemented logistic regression and the derived regularized techniques using age and multidimensional cytokine biomarkers with the objective of identification of individuals "At Risk" for CAD. These techniques were further empowered by k-fold cross validation and hyper parameter tuning. Of the numerous algorithms investigated, the three most prominent ones, assessed based on area under receiver operating characteristic (AUROC) score are as follows: logistic regression, least absolute shrinkage, and selection operator (LASSO) regression with feature selection and ridge regression with feature selection. Logistic regression demonstrated an AUROC score of .85 with a 95% Confidence Interval CI (.804, .897), LASSO regression achieved a better AUROC score of .875 with a 95% CI (.832, .917) and finally ridge regression with feature selection exhibited the highest AUROC score of .878 with a 95% CI (.837, .92). The 2-sample independent t test proved that the three techniques were statistically significantly different from each other. With regard to the best classification demonstrated by ridge regression with feature selection, the most prominent biomarkers identified for the best classification achieved by ridge regression by feature selection, in the order of importance are as follows: Age, IL-7, RANTES, IFN-gamma, IL-3, GM-CSF, IL-15, IP-10, GCSF, IL-12. The identification and quantification of cytokines transported by HDL provide novel mechanistic insights that can inform the assessment of risk and therapeutic intervention in CAD.
Keywords: CAD; Cytokines; LASSO Regression; Logistic Regression; Ridge Regression.