Background and objective: Recent advances in neural networks and temporal image processing have provided new results and opportunities for vision-based bronchoscopy tracking. However, such progress has been hindered by the lack of comparative experimental data conditions. We address the issue by sharing a novel synthetic dataset, which allows for a fair comparison of methods. Moreover, as incorporating deep learning advances in temporal structures is not yet explored in bronchoscopy navigation, we investigate several neural network architectures for learning temporal information at different levels of subject personalization, providing new insights and results.
Methods: Using our own shared synthetic dataset for bronchoscopy navigation and tracking, we explore deep learning temporal information architectures (Recurrent Neural Networks and 3D convolutions), which have not been fully explored on bronchoscopy tracking, putting a special focus on network efficiency by using a modern backbone (EfficientNet-B0) and ShuffleNet blocks. Finally, we provide a study of different losses for rotation tracking and population modeling schemes (personalized vs. population) for bronchoscopy tracking.
Results: Temporal information architectures provide performance improvements, both in position and angle estimation. Additionally, population scheme analysis illustrates the benefits of offering a personalized model, while loss analysis indicates the benefits of using an adequate metric, improving results. We finally compare with a state-of-the-art model obtaining better results both in performance, with 12.2% and 18.7% improvement for position and rotation respectively, and around 67.6% reduction in memory consumption.
Conclusions: Proposed advances in temporal information architectures, loss configuration, and population scheme definition allow for improving the current state of the art in bronchoscopy analysis. Moreover, the publication of the first synthetic dataset allows for further improving bronchoscopy research by enabling proper comparison grounds among methods.
Keywords: Architecture optimization; Datasets; Deep learning; Pose estimation; Standardized evaluation framework; Video bronchoscopy guiding.
Copyright © 2022 Elsevier B.V. All rights reserved.