Background: Due to the high attenuation of metals, severe artifacts occur in cone beam computed tomography (CBCT). The metal segmentation in CBCT projections usually serves as a prerequisite for metal artifact reduction (MAR) algorithms.
Purpose: The occurrence of truncation caused by the limited detector size leads to the incomplete acquisition of metal masks from the threshold-based method in CBCT volume. Therefore, segmenting metal directly in CBCT projections is pursued in this work.
Methods: Since the generation of high quality clinical training data is a constant challenge, this study proposes to generate simulated digital radiographs (data I) based on real CT data combined with self-designed computer aided design (CAD) implants. In addition to the simulated projections generated from 3D volumes, 2D x-ray images combined with projections of implants serve as the complementary data set (data II) to improve the network performance. In this work, SwinConvUNet consisting of shift window (Swin) vision transformers (ViTs) with patch merging as encoder is proposed for metal segmentation.
Results: The model's performance is evaluated on accurately labeled test datasets obtained from cadaver scans as well as the unlabeled clinical projections. When trained on the data I only, the convolutional neural network (CNN) encoder-based networks UNet and TransUNet achieve only limited performance on the cadaver test data, with an average dice score of 0.821 and 0.850. After using both data II and data I during training, the average dice scores for the two models increase to 0.906 and 0.919, respectively. By replacing the CNN encoder with Swin transformer, the proposed SwinConvUNet reaches an average dice score of 0.933 for cadaver projections when only trained on the data I. Furthermore, SwinConvUNet has the largest average dice score of 0.953 for cadaver projections when trained on the combined data set.
Conclusions: Our experiments quantitatively demonstrate the effectiveness of the combination of the projections simulated under two pathways for network training. Besides, the proposed SwinConvUNet trained on the simulated projections performs state-of-the-art, robust metal segmentation as demonstrated on experiments on cadaver and clinical data sets. With the accurate segmentations from the proposed model, MAR can be conducted even for highly truncated CBCT scans.
Keywords: data argumentation; metal artifact reduction; metal segmentation; swin vision transformer.
© 2023 The Authors. Medical Physics published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.