Cigarette smoking is a preventable epidemic that is a leading cause of death. It increases the risk of coronary heart disease, stroke, lung cancer, chronic obstructive lung diseases etc., multifold. Smoking tobacco is not only injurious to oneself but also to those who are exposed second hand. Smoking induces endothelial dysfunction via inflammatory cytokines that can be quantified precisely. Cytokines can be leveraged as powerful predictive biomarkers for identifying risk of potential diseases. Current advances in biomarker research are providing substantive evidence of the roles of cytokines in disease. This is driving precision-based diagnosis and translational therapeutic interventions. Innovative machine algorithms (ML) are pioneering transformative changes in the field of medical research. This research implements the Neural Networks (NN) algorithm to classify smokers versus non-smokers using 63 cytokines as predictor features. In addition to the fact that NN is a generative algorithm, which makes it a very powerful tool to achieve the objective of this differentiation, techniques like cross validation and hyperparameter tuning improve the efficacy of the algorithm. The study identified the 10 most impactful predictor features that contributed to the classification and then used these to characterize smokers versus non-smokers. Primarily, the study constructed and investigated two classifiers, of which the first implemented NN using the entire set of 63 cytokines and the second using 10 most informative cytokines. The performance of the first classifier, implemented using 63 cytokines, evaluated by area under receiver operating characteristic (AUROC), was extremely good with an AUROC score of .949 and 95% Confidence Interval (CI) (.923,.974). The second classifier that used the 10 most impactful cytokines with regard to the classification, demonstrated an exemplary performance, with an AUROC score of .995 and a 95% CI (.991,1). The 10 most impactful cytokines from the aspect of smoker versus non-smoker differentiation, listed in order of importance, include: I-TAC, IL-22, IL-2R, IL-3, HGF, IL-18, G-CSF-CSF-3, MIF, SDF-1alpha, MMP-1. To gain a deeper understanding of the effect of smoking on cytokine levels, a 2-sample independent t test was performed, ascertaining the statistical significance of the 63 cytokine levels in smokers versus non-smokers. Machine Learning using biomarkers such as cytokines will enhance the ability to predict the advent of a disease and its outcome, and lead to novel treatment strategies.
Keywords: AUROC; Classification; NN; Plasma cytokines; Variable Importance.