Benchmarking clinical risk prediction algorithms with ensemble machine learning: An illustration of the superlearner algorithm for the non-invasive diagnosis of liver fibrosis in non-alcoholic fatty liver disease

medRxiv [Preprint]. 2023 Aug 4:2023.08.02.23293569. doi: 10.1101/2023.08.02.23293569.

Abstract

Background and aims: Ensemble machine learning (ML) methods can combine many individual models into a single 'super' model using an optimal weighted combination. Here we demonstrate how an underutilized ensemble model, the superlearner, can be used as a benchmark for model performance in clinical risk prediction. We illustrate this by implementing a superlearner to predict liver fibrosis in patients with non-alcoholic fatty liver disease (NAFLD).

Methods: We trained a superlearner based on 23 demographic and clinical variables, with the goal of predicting stage 2 or higher liver fibrosis. The superlearner was trained on data from the Non-alcoholic steatohepatitis - clinical research network observational study (NASH-CRN, n=648), and validated using data from participants in a randomized trial for NASH ('FLINT' trial, n=270) and data from examinees with NAFLD who participated in the National Health and Nutrition Examination Survey (NHANES, n=1244). We compared the performance of the superlearner with existing models, including FIB-4, NFS, Forns, APRI, BARD and SAFE.

Results: In the FLINT and NHANES validation sets, the superlearner (derived from 12 base models) discriminates patients with significant fibrosis from those without well, with AUCs of 0.79 (95% CI: 0.73-0.84) and 0.74 (95% CI: 0.68-0.79). Among the existing scores considered, the SAFE score performed similarly to the superlearner, and the superlearner and SAFE scores outperformed FIB-4, APRI, Forns, and BARD scores in the validation datasets. A superlearner model derived from 12 base models performed as well as one derived from 90 base models.

Conclusions: The superlearner, thought of as the "best-in-class" ML prediction, performed better than most existing models commonly used in practice in detecting fibrotic NASH. The superlearner can be used to benchmark the performance of conventional clinical risk prediction models.

Publication types

  • Preprint