Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery

Juan Bernardo Villarreal-Espinosa; Rodrigo Saad Berreta; Felicitas Allende; José Rafael Garcia; Salvador Ayala; Filippo Familiari; Jorge Chahla

doi:10.1016/j.knee.2024.08.014

Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery

Knee. 2024 Dec:51:84-92. doi: 10.1016/j.knee.2024.08.014. Epub 2024 Sep 5.

Authors

Juan Bernardo Villarreal-Espinosa¹, Rodrigo Saad Berreta¹, Felicitas Allende¹, José Rafael Garcia¹, Salvador Ayala¹, Filippo Familiari², Jorge Chahla³

Affiliations

¹ Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA.
² Magna Graecia University of Catanzaro, Catanzaro, Italy.
³ Department of Orthopedics, Rush University Medical Center, Chicago, IL, USA. Electronic address: [email protected].

PMID: 39241674
DOI: 10.1016/j.knee.2024.08.014

Abstract

Background: The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response's accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery.

Methods: A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen's kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed.

Results: Five of the 10 responses received a 'completely accurate' grade by two-fellowship trained surgeons with three additional replies receiving a 'completely accurate' status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15-0.99). Additionally, 80% of the responses were reproducible over time.

Conclusion: ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient-surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.

Keywords: ACL surgery; AI; ChatGPT; Frequently asked questions (FAQs); Patient education.

MeSH terms

Anterior Cruciate Ligament Injuries / surgery
Anterior Cruciate Ligament Reconstruction*
Fellowships and Scholarships
Humans
Reproducibility of Results
Surveys and Questionnaires