A Machine Learning Model to Predict De Novo Hepatocellular Carcinoma Beyond Year 5 of Antiviral Therapy in Patients With Chronic Hepatitis B

Liver Int. 2024 Dec 18. doi: 10.1111/liv.16139. Online ahead of print.

Abstract

Background and aims: This study aims to develop and validate a machine learning (ML) model predicting hepatocellular carcinoma (HCC) in chronic hepatitis B (CHB) patients after the first 5 years of entecavir (ETV) or tenofovir (TFV) therapy.

Methods: CHB patients treated with ETV/TFV for > 5 years and not diagnosed with HCC during the first 5 years of therapy were selected from two hospitals. We used 36 variables, including baseline characteristics (age, sex, cirrhosis, and type of antiviral agent) and laboratory values (at baseline, at 5 years, and changes between 5 years) for model development. Five machine learning algorithms were applied to the training dataset and internally validated using a test dataset. External validation was performed.

Results: In years 5-15, a total of 279/5908 (4.7%) and 25/562 (4.5%) patients developed HCC in the derivation and external validation cohorts, respectively. In the training dataset (n = 4726), logistic regression showed the highest area under the receiver operating curve (AUC) of 0.803 and a balanced accuracy of 0.735, outperforming other ML algorithms. An ensemble model combining logistic regression and random forest performed best (AUC, 0.811 and balanced accuracy, 0.754). The results from the test dataset (n = 1182) verified the good performance of the ensemble model (AUC, 0.784 and balanced accuracy, 0.712). External validation confirmed the predictive accuracy of our ensemble model (AUC, 0.862 and balanced accuracy, 0.771). A web-based calculator was developed (http://ai-wm.khu.ac.kr/HCC/).

Conclusions: The proposed ML model excellently predicted HCC risk beyond year 5 of ETV/TFV therapy and, therefore, could facilitate individualised HCC surveillance based on risk stratification.

Keywords: chronic hepatitis B; hepatocellular carcinoma; machine learning; prediction; validation.