LRTD: long-range temporal dependency based active learning for surgical workflow recognition

Int J Comput Assist Radiol Surg. 2020 Sep;15(9):1573-1584. doi: 10.1007/s11548-020-02198-9. Epub 2020 Jun 25.

Abstract

Purpose: Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations.

Methods: In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training.

Results: We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training.

Conclusion: By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.

Keywords: Active learning; Intra-clip dependency; Long-range temporal dependency; Surgical workflow recognition.

MeSH terms

  • Algorithms
  • Computer Simulation
  • Humans
  • Learning
  • Models, Statistical
  • Neural Networks, Computer
  • Pattern Recognition, Automated*
  • Problem-Based Learning*
  • Reproducibility of Results
  • Robotic Surgical Procedures*
  • Surgeons
  • Surgery, Computer-Assisted / instrumentation
  • Surgery, Computer-Assisted / methods*
  • Video Recording
  • Workflow*