Background: Increased utilization of artificial intelligence (AI)-driven search and large language models by the lay and medical community requires us to evaluate the accuracy of AI responses to common hand surgery questions. We hypothesized that the answers to most hand surgery questions posed to an AI large language model would be correct. Methods: Using the topics covered in Green's Operative Hand Surgery 8th Edition as a guide, 56 hand surgery questions were compiled and posed to ChatGPT (OpenAI, San Francisco, CA). Two attending hand surgeons then independently reviewed ChatGPT's answers for response accuracy, completeness, and usefulness. A Cohen's kappa analysis was performed to assess interrater agreement. Results: An average of 45 of the 56 questions posed to ChatGPT were deemed correct (80%), 39 responses were deemed useful (70%), and 32 responses were deemed complete (57%) by the reviewers. Kappa analysis demonstrated "fair to moderate" agreement between the two raters. Reviewers disagreed on 11 questions regarding correctness, 16 questions regarding usefulness, and 19 questions regarding completeness. Conclusions: Large language models have the potential to both positively and negatively impact patient perceptions and guide referral patterns based on the accuracy, completeness, and usefulness of their responses. While most responses fit these criteria, more precise responses are needed to ensure patient safety and avoid misinformation. Individual hand surgeons and surgical societies must understand these technologies and interface with the companies developing them to provide our patients with the best possible care.
Keywords: artificial intelligence; chatbot; hand surgery; large language models; patient safety.