Code-free machine learning for object detection in surgical video: a benchmarking, feasibility, and cost study

Vyom Unadkat; Dhiraj J Pangal; Guillaume Kugener; Arman Roshannai; Justin Chan; Yichao Zhu; Nicholas Markarian; Gabriel Zada; Daniel A Donoho

doi:10.3171/2022.1.FOCUS21652

Code-free machine learning for object detection in surgical video: a benchmarking, feasibility, and cost study

Neurosurg Focus. 2022 Apr;52(4):E11. doi: 10.3171/2022.1.FOCUS21652.

Authors

Vyom Unadkat^{1

2}, Dhiraj J Pangal², Guillaume Kugener², Arman Roshannai², Justin Chan², Yichao Zhu², Nicholas Markarian², Gabriel Zada², Daniel A Donoho³

Affiliations

¹ 1Department of Computer Science, USC Viterbi School of Engineering, Los Angeles, California.
² 2Department of Neurosurgery, Keck School of Medicine of USC, Los Angeles, California; and.
³ 3Division of Neurosurgery, Center for Neurosciences, Children's National Hospital, Washington, DC.

PMID: 35364576
DOI: 10.3171/2022.1.FOCUS21652

Abstract

Objective: While the utilization of machine learning (ML) for data analysis typically requires significant technical expertise, novel platforms can deploy ML methods without requiring the user to have any coding experience (termed AutoML). The potential for these methods to be applied to neurosurgical video and surgical data science is unknown.

Methods: AutoML, a code-free ML (CFML) system, was used to identify surgical instruments contained within each frame of endoscopic, endonasal intraoperative video obtained from a previously validated internal carotid injury training exercise performed on a high-fidelity cadaver model. Instrument-detection performances using CFML were compared with two state-of-the-art ML models built using the Python coding language on the same intraoperative video data set.

Results: The CFML system successfully ingested surgical video without the use of any code. A total of 31,443 images were used to develop this model; 27,223 images were uploaded for training, 2292 images for validation, and 1928 images for testing. The mean average precision on the test set across all instruments was 0.708. The CFML model outperformed two standard object detection networks, RetinaNet and YOLOv3, which had mean average precisions of 0.669 and 0.527, respectively, in analyzing the same data set. Significant advantages to the CFML system included ease of use, relatively low cost, displays of true/false positives and negatives in a user-friendly interface, and the ability to deploy models for further analysis with ease. Significant drawbacks of the CFML model included an inability to view the structure of the trained model, an inability to update the ML model once trained with new examples, and the inability for robust downstream analysis of model performance and error modes.

Conclusions: This first report describes the baseline performance of CFML in an object detection task using a publicly available surgical video data set as a test bed. Compared with standard, code-based object detection networks, CFML exceeded performance standards. This finding is encouraging for surgeon-scientists seeking to perform object detection tasks to answer clinical questions, perform quality improvement, and develop novel research ideas. The limited interpretability and customization of CFML models remain ongoing challenges. With the further development of code-free platforms, CFML will become increasingly important across biomedical research. Using CFML, surgeons without significant coding experience can perform exploratory ML analyses rapidly and efficiently.

Keywords: AutoML; artificial intelligence; big data; education; surgical video.

MeSH terms

Algorithms
Benchmarking*
Feasibility Studies
Humans
Machine Learning
Surgeons*