Background: The presence of significant liver fibrosis is a key determinant of long-term prognosis in non-alcoholic fatty liver disease (NAFLD). We aimed to develop a novel machine learning algorithm (MLA) to predict fibrosis severity in NAFLD and compared it with the most widely used non-invasive fibrosis biomarkers.
Methods: We used a cohort of 553 adults with biopsy-proven NAFLD, who were randomly divided into a training cohort (n = 278) for the development of both logistic regression model (LRM) and MLA, and a validation cohort (n = 275). Significant fibrosis was defined as fibrosis stage F ≥ 2. MLA and LRM were derived from variables that were selected using a least absolute shrinkage and selection operator (LASSO) logistic regression algorithm.
Results: In the training cohort, the variables selected by LASSO algorithm were body mass index, pro-collagen type III, collagen type IV, aspartate aminotransferase and albumin-to-globulin ratio. The diagnostic accuracy of MLA showed the highest values of area under the receiver operator characteristic curve (AUROC: 0.902, 95% CI 0.869-0.904) for identifying fibrosis F ≥ 2. The LRM AUROC was 0.764, 95% CI 0.710-0.816 and significantly better than the AST-to-Platelet ratio (AUROC 0.684, 95% CI 0.605-0.762), FIB-4 score (AUROC 0.594, 95% CI 0.503-0.685) and NAFLD Fibrosis Score (AUROC 0.557, 95% CI 0.470-0.644). In the validation cohort, MLA also showed the highest AUROC (0.893, 95% CI 0.864-0.901). The diagnostic accuracy of MLA outperformed that of LRM in all subgroups considered.
Conclusions: Our newly developed MLA algorithm has excellent diagnostic performance for predicting fibrosis F ≥ 2 in patients with biopsy-confirmed NAFLD.
Keywords: NAFLD; diagnosis; fibrosis; liver biopsy; machine learning algorithm.
© 2021 Japanese Society of Hepato-Biliary-Pancreatic Surgery.