Enhancing Orthopedic Knowledge Assessments: The Performance of Specialized Generative Language Model Optimization

Curr Med Sci. 2024 Oct;44(5):1001-1005. doi: 10.1007/s11596-024-2929-4. Epub 2024 Oct 5.

Abstract

Objective: This study aimed to evaluate and compare the effectiveness of knowledge base-optimized and unoptimized large language models (LLMs) in the field of orthopedics to explore optimization strategies for the application of LLMs in specific fields.

Methods: This research constructed a specialized knowledge base using clinical guidelines from the American Academy of Orthopaedic Surgeons (AAOS) and authoritative orthopedic publications. A total of 30 orthopedic-related questions covering aspects such as anatomical knowledge, disease diagnosis, fracture classification, treatment options, and surgical techniques were input into both the knowledge base-optimized and unoptimized versions of the GPT-4, ChatGLM, and Spark LLM, with their generated responses recorded. The overall quality, accuracy, and comprehensiveness of these responses were evaluated by 3 experienced orthopedic surgeons.

Results: Compared with their unoptimized LLMs, the optimized version of GPT-4 showed improvements of 15.3% in overall quality, 12.5% in accuracy, and 12.8% in comprehensiveness; ChatGLM showed improvements of 24.8%, 16.1%, and 19.6%, respectively; and Spark LLM showed improvements of 6.5%, 14.5%, and 24.7%, respectively.

Conclusion: The optimization of knowledge bases significantly enhances the quality, accuracy, and comprehensiveness of the responses provided by the 3 models in the orthopedic field. Therefore, knowledge base optimization is an effective method for improving the performance of LLMs in specific fields.

Keywords: artificial intelligence; generative articial intelligence; large language models; orthopedics.

MeSH terms

  • Humans
  • Knowledge Bases*
  • Language
  • Orthopedic Procedures
  • Orthopedic Surgeons / standards
  • Orthopedics* / standards