Comparative Analysis of Large Language Models and Spine Surgeons in Surgical Decision-Making and Radiological Assessment for Spine Pathologies

World Neurosurg. 2024 Dec 23:194:123531. doi: 10.1016/j.wneu.2024.11.114. Online ahead of print.

Abstract

Objective: This study aimed to investigate the accuracy of large language models (LLMs), specifically ChatGPT and Claude, in surgical decision-making and radiological assessment for spine pathologies compared to experienced spine surgeons.

Methods: The study employed a comparative analysis between the LLMs and a panel of attending spine surgeons. Five written clinical scenarios encompassing various spine pathologies were presented to the LLMs and surgeons, who provided recommended surgical treatment plans. Additionally, magnetic resonance imaging images depicting spine pathologies were analyzed by the LLMs and surgeons to assess their radiological interpretation abilities. Spino-pelvic parameters were estimated from a scoliosis radiograph by the LLMs.

Results: Qualitative content analysis revealed limitations in the LLMs' consideration of patient-specific factors and the breadth of treatment options. Both ChatGPT and Claude provided detailed descriptions of magnetic resonance imaging findings but differed from the surgeons in terms of specific levels and severity of pathologies. The LLMs acknowledged the limitations of accurately measuring spino-pelvic parameters without specialized tools. The accuracy of surgical decision-making for the LLMs (20%) was lower than that of the attending surgeons (100%). Statistical analysis showed no significant differences in accuracy between the groups.

Conclusions: The study highlights the potential of LLMs in assisting with radiological interpretation and surgical decision-making in spine surgery. However, the current limitations, such as the lack of consideration for patient-specific factors and inaccuracies in treatment recommendations, emphasize the need for further refinement and validation of these artificial intelligence (AI) models. Continued collaboration between AI researchers and clinical experts is crucial to address these challenges and realize the full potential of AI in spine surgery.

Keywords: Artificial intelligence; Large language models; Radiological assessment; Spine pathologies; Surgical decision-making.