A clinical incident is typically manifested by several molecular events; therefore, it seems logical that a successful diagnosis, prognosis, or stratification of a clinical landmark require multiple biomarkers. In this report, we presented a machine learning pipeline, namely "Biomarker discovery process at binomial decision point" (2BDP) that took an integrative approach in systematically curating independent variables (e.g., multiple molecular markers) to explain an output variable (e.g., clinical landmark) of binary in nature. In a logical sequence, 2BDP includes feature selection, unsupervised model development and cross validation. In the present work, the efficiency of 2BDP was demonstrated by finding three biomarker panels that independently explained three stages of Alzheimer's disease (AD) marked as Braak stages I, II and III, respectively. We designed three assortments from the entire cohort based on these Braak stages; subsequently, each assortment was split into two populations at Braak score I, II or III. 2BDP systematically integrated random forest and logistic regression fitting model to find biomarker panels with minimum features that explained these three assortments, e.g., significantly differentiated two populations segregated by Braak stage I, II or III, respectively. Thereafter, the efficacies of these panels were measured by the area under the curve (AUC) values of the receiver operating characteristic (ROC) plot. The AUC-ROC was calculated by two cross-validation methods. Final set of gene markers was a mix of novel and a priori established AD signatures. These markers were weighted by unique coefficients and linearly connected in a group of 2-10 to explain Braak stage I, II or III by AUC ≥ 0.8. Small sample size and a lack of distinctly recruited Training and Test sets were the limitations of the present undertaking; yet 2BDP demonstrated its capability to curate a panel of optimum numbers of biomarkers to describe the outcome variable with high efficacy.
Keywords: Algorithm pipeline; Alzheimer's Disease; Artificial intelligence; Biomarker discovery; Braak Stages; Diagnostic biomarker; Logistic Regression Model; Machine learning.