Background: This study evaluated the ability to differentiate between AI-generated and human-authored abstracts in foot and ankle surgery.
Methods: An AI system (ChatGPT 3.0) was trained on 21 published abstracts to create six novel case abstracts. Nine foot and ankle surgeons participated in a blinded survey, tasked with distinguishing AI-generated from human-written abstracts, rating their confidence in their responses. Surveys were completed twice at two different time points to evaluate intra-/inter-observer reliability.
Results: The overall accuracy rate for distinguishing AI-generated from human-written abstracts was 50.5 % (n = 109/216), indicating no better performance than random chance. Reviewer experience and AI familiarity did not significantly affect accuracy. Inter-rater reliability was moderate initially but decreased over time, and intra-rater reliability was poor.
Conclusions: In their current form, AI-generated abstracts are nearly indistinguishable from human-written ones, posing challenges for consistent identification in foot and ankle surgery.
Level of evidence: IV.
Keywords: AI; Artificial intelligence; Ethics; Foot and ankle surgery; Scientific writing.
Copyright © 2024 European Foot and Ankle Society. Published by Elsevier Ltd. All rights reserved.