ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology

Umar Ghilzai; Benjamin Fiedler; Abdullah Ghali; Aaron Singh; Benjamin Cass; Allan Young; Adil Shahzad Ahmed

doi:10.1177/17585732241283971

ChatGPT provides acceptable responses to patient questions regarding common shoulder pathology

Shoulder Elbow. 2024 Sep 25:17585732241283971. doi: 10.1177/17585732241283971. Online ahead of print.

Authors

Umar Ghilzai¹, Benjamin Fiedler¹, Abdullah Ghali¹, Aaron Singh², Benjamin Cass³, Allan Young³, Adil Shahzad Ahmed¹

Affiliations

¹ Baylor College of Medicine, Department of Orthopedic Surgery, Houston, TX, USA.
² UT Health San Antonio, Department of Orthopaedics, San Antonio, TX, USA.
³ Sydney Shoulder Research Institute, Sydney Shoulder Specialists, Greenwich, New South Wales, Australia.

Abstract

Background: ChatGPT is rapidly becoming a source of medical knowledge for patients. This study aims to assess the completeness and accuracy of ChatGPT's answers to the most frequently asked patients' questions about shoulder pathology.

Methods: ChatGPT (version 3.5) was queried to produce the five most common shoulder pathologies: biceps tendonitis, rotator cuff tears, shoulder arthritis, shoulder dislocation and adhesive capsulitis. Subsequently, it generated the five most common patient questions regarding these pathologies and was queried to respond. Responses were evaluated by three shoulder and elbow fellowship-trained orthopedic surgeons with a mean of 9 years of independent practice, on Likert scales for accuracy (1-6) and completeness (rated 1-3).

Results: For all questions, responses were deemed acceptable, rated at least "nearly all correct," indicated by a score of 5 or greater for accuracy, and "adequately complete," indicated by a minimum of 2 for completeness. The mean scores for accuracy and completeness, respectively, were 5.5 and 2.6 for rotator cuff tears, 5.8 and 2.7 for shoulder arthritis, 5.5 and 2.3 for shoulder dislocations, 5.1 and 2.4 for adhesive capsulitis, 5.8 and 2.9 for biceps tendonitis.

Conclusion: ChatGPT provides both accurate and complete responses to the most common patients' questions about shoulder pathology. These findings suggest that Large Language Models might play a role as a patient resource; however, patients should always verify online information with their physician.

Level of evidence: Level V Expert Opinion.

Keywords: Artificial intelligence; ChatGPT; large language model; machine learning; shoulder.