Endoscopic endonasal surgery is a medical procedure that utilizes an endoscopic video camera to view and manipulate a surgical site accessed through the nose. Despite these surgeries being video recorded, these videos are seldom reviewed or even saved in patient files due to the size and length of the video file. Editing to a manageable size may necessitate viewing 3 h or more of surgical video and manually splicing together the desired segments. We suggest a novel multi-stage video summarization procedure utilizing deep semantic features, tool detections, and video frame temporal correspondences to create a representative summarization. Summarization by our method resulted in a 98.2% reduction in overall video length while preserving 84% of key medical scenes. Furthermore, resulting summaries contained only 1% of scenes with irrelevant detail such as endoscope lens cleaning, blurry frames, or frames external to the patient. This outperformed leading commercial and open source summarization tools not designed for surgery, which only preserved 57% and 46% of key medical scenes in similar length summaries, and included 36% and 59% of scenes containing irrelevant detail. Experts agreed that on average (Likert Scale = 4) that the overall quality of the video was adequate to share with peers in its current state.
Keywords: Deep learning; Endonasal surgery; Hidden Markov model; Object detection; Summarization.
Copyright © 2023 Elsevier Ltd. All rights reserved.