A Machine Learning Model to Predict De Novo Hepatocellular Carcinoma Beyond Year 5 of Antiviral Therapy in Patients With Chronic Hepatitis B

Yeonjung Ha; Seungseok Lee; Jihye Lim; Kwanjoo Lee; Young Eun Chon; Joo Ho Lee; Kwan Sik Lee; Kang Mo Kim; Ju Hyun Shim; Danbi Lee; Dong Keon Yon; Jinseok Lee; Han Chu Lee

doi:10.1111/liv.16139

A Machine Learning Model to Predict De Novo Hepatocellular Carcinoma Beyond Year 5 of Antiviral Therapy in Patients With Chronic Hepatitis B

Liver Int. 2024 Dec 18. doi: 10.1111/liv.16139. Online ahead of print.

Authors

Yeonjung Ha¹, Seungseok Lee², Jihye Lim³, Kwanjoo Lee¹, Young Eun Chon¹, Joo Ho Lee¹, Kwan Sik Lee¹, Kang Mo Kim⁴, Ju Hyun Shim⁴, Danbi Lee⁴, Dong Keon Yon⁵, Jinseok Lee², Han Chu Lee⁴

Affiliations

¹ Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea.
² Department of Biomedical Engineering, College of Electronics and Informatics, Kyung Hee University, Yongin-si, Gyeonggi-do, South Korea.
³ Division of Gastroenterology and Hepatology, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea.
⁴ Asan Liver Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
⁵ Center for Digital Health, Medical Research Institute, Kyung Hee University Medical Center, Kyung Hee University, Seoul, South Korea.

PMID: 39692285
DOI: 10.1111/liv.16139

Abstract

Background and aims: This study aims to develop and validate a machine learning (ML) model predicting hepatocellular carcinoma (HCC) in chronic hepatitis B (CHB) patients after the first 5 years of entecavir (ETV) or tenofovir (TFV) therapy.

Methods: CHB patients treated with ETV/TFV for > 5 years and not diagnosed with HCC during the first 5 years of therapy were selected from two hospitals. We used 36 variables, including baseline characteristics (age, sex, cirrhosis, and type of antiviral agent) and laboratory values (at baseline, at 5 years, and changes between 5 years) for model development. Five machine learning algorithms were applied to the training dataset and internally validated using a test dataset. External validation was performed.

Results: In years 5-15, a total of 279/5908 (4.7%) and 25/562 (4.5%) patients developed HCC in the derivation and external validation cohorts, respectively. In the training dataset (n = 4726), logistic regression showed the highest area under the receiver operating curve (AUC) of 0.803 and a balanced accuracy of 0.735, outperforming other ML algorithms. An ensemble model combining logistic regression and random forest performed best (AUC, 0.811 and balanced accuracy, 0.754). The results from the test dataset (n = 1182) verified the good performance of the ensemble model (AUC, 0.784 and balanced accuracy, 0.712). External validation confirmed the predictive accuracy of our ensemble model (AUC, 0.862 and balanced accuracy, 0.771). A web-based calculator was developed (http://ai-wm.khu.ac.kr/HCC/).

Conclusions: The proposed ML model excellently predicted HCC risk beyond year 5 of ETV/TFV therapy and, therefore, could facilitate individualised HCC surveillance based on risk stratification.

Keywords: chronic hepatitis B; hepatocellular carcinoma; machine learning; prediction; validation.