Accurate segmentation of ankle and foot bones from CT scans is essential for morphological analysis. Ankle and foot bone segmentation challenges due to the blurred bone boundaries, narrow inter-bone gaps, gaps in the cortical shell, and uneven spongy bone textures. Our study endeavors to create a deep learning framework that harnesses advantages of 3D deep learning and tackles the hurdles in accurately segmenting ankle and foot bones from clinical CT scans. A few-shot framework AFSegNet is proposed considering the computational cost, which comprises three 3D deep-learning networks adhering to the principles of progressing from simple to complex tasks and network structures. Specifically, a shallow network first over-segments the foreground, and along with the foreground ground truth are used to supervise a subsequent network to detect the over-segmented regions, which are overwhelmingly inter-bone gaps. The foreground and inter-bone gap probability map are then input into a network with multi-scale attentions and feature fusion, a loss function combining region-, boundary-, and topology-based terms to get the fine-level bone segmentation. AFSegNet is applied to the 16-class segmentation task utilizing 123 in-house CT scans, which only requires a GPU with 24 GB memory since the three sub-networks can be successively and individually trained. AFSegNet achieves a Dice of 0.953 and average surface distance of 0.207. The ablation study and comparison with two basic state-of-the-art networks indicates the effectiveness of the progressively distilled features, attention and feature fusion modules, and hybrid loss functions, with the mean surface distance error decreased up to 50 %.
Keywords: 3D deep learning; Bone segmentation; Feature fusion; Hierarchical feature distillation; Multi-level attention.
Copyright © 2024 Elsevier Ltd. All rights reserved.