Comparison of deep learning schemes in grading non-alcoholic fatty liver disease using B-mode ultrasound hepatorenal window images with liver biopsy as the gold standard

Phys Med. 2024 Dec 2:129:104862. doi: 10.1016/j.ejmp.2024.104862. Online ahead of print.

Abstract

Background/introduction: To evaluate the performance of pre-trained deep learning schemes (DLS) in hepatic steatosis (HS) grading of Non-Alcoholic Fatty Liver Disease (NAFLD) patients, using as input B-mode US images containing right kidney (RK) cortex and liver parenchyma (LP) areas indicated by an expert radiologist.

Methods: A total of 112 consecutively enrolled, biopsy-validated NAFLD patients underwent a regular abdominal B-mode US examination. For each patient, a radiologist obtained a B-mode US image containing RK cortex and LP and marked a point between the RK and LP, around which a window was automatically cropped. The cropped image dataset was augmented using up-sampling, and the augmented and non-augmented datasets were sorted by HS grade. Each dataset was split into training (70%) and testing (30%), and fed separately as input to InceptionV3, MobileNetV2, ResNet50, DenseNet201, and NASNetMobile pre-trained DLS. A receiver operating characteristic (ROC) analysis of hepatorenal index (HRI) measurements by the radiologist from the same cropped images was used for comparison with the performance of the DLS.

Results: With the test data, the DLS reached 89.15 %-93.75 % accuracy when comparing HS grades S0-S1 vs. S2-S3 and 79.69 %-91.21 % accuracy for S0 vs. S1 vs. S2 vs. S3 with augmentation, and 80.45-82.73 % accuracy when comparing S0-S1 vs. S2-S3 and 59.54 %-63.64 % accuracy for S0 vs. S1 vs. S2 vs. S3 without augmentation. The performance of radiologists' HRI measurement after ROC analysis was 82 %, 91.56 %, and 96.19 % for thresholds of S ≥ S1, S ≥ S2, and S = S3, respectively.

Conclusion: All networks achieved high performance in HS assessment. DenseNet201 with the use of augmented data seems to be the most efficient supplementary tool for NAFLD diagnosis and grading.

Keywords: B-mode ultrasound; Chronic liver disease; Hepatic steatosis; Pre-trained deep learning schemes.