Finding a constrained number of predictor phenotypes for multiple outcome prediction

BMJ Health Care Inform. 2025 Jan 16;32(1):e101227. doi: 10.1136/bmjhci-2024-101227.

Abstract

Background: Prognostic models help aid medical decision-making. Various prognostic models are available via websites such as MDCalc, but these models typically predict one outcome, for example, stroke risk. Each model requires individual predictors, for example, age, lab results and comorbidities. There is no clinical tool available to predict multiple outcomes from a list of common medical predictors.

Objective: Identify a constrained set of outcome-agnostic predictors.

Methods: We proposed a novel technique aggregating the standardised mean difference across hundreds of outcomes to learn a constrained set of predictors that appear to be predictive of many outcomes. Model performance was evaluated using the constrained set of predictors across eight prediction tasks. We compared against existing models, models using only age/sex predictors and models without any predictor constraints.

Results: We identified 67 predictors in our constrained set, plus age/sex. Our predictors included illnesses in the following categories: cardiovascular, kidney/liver, mental health, gastrointestinal, infectious and oncologic. Models developed using the constrained set of predictors achieved comparable discrimination compared with models using hundreds or thousands of predictors for five of the eight prediction tasks and slightly lower discrimination for three of the eight tasks. The constrained predictor models performed as good or better than all existing clinical models.

Conclusions: It is possible to develop models for hundreds or thousands of outcomes that use the same small set of predictors. This makes it feasible to implement many prediction models via a single website form. Our set of predictors can also be used for future models and prognostic model research.

Keywords: Data Science; Electronic Health Records; Evidence-Based Medicine; Medical Informatics; Preventive Medicine.

MeSH terms

  • Female
  • Humans
  • Male
  • Phenotype*
  • Prognosis