AI discernment in foot and ankle surgery research: A survey investigation

Foot Ankle Surg. 2024 Oct 9:S1268-7731(24)00226-1. doi: 10.1016/j.fas.2024.10.001. Online ahead of print.

Abstract

Background: This study evaluated the ability to differentiate between AI-generated and human-authored abstracts in foot and ankle surgery.

Methods: An AI system (ChatGPT 3.0) was trained on 21 published abstracts to create six novel case abstracts. Nine foot and ankle surgeons participated in a blinded survey, tasked with distinguishing AI-generated from human-written abstracts, rating their confidence in their responses. Surveys were completed twice at two different time points to evaluate intra-/inter-observer reliability.

Results: The overall accuracy rate for distinguishing AI-generated from human-written abstracts was 50.5 % (n = 109/216), indicating no better performance than random chance. Reviewer experience and AI familiarity did not significantly affect accuracy. Inter-rater reliability was moderate initially but decreased over time, and intra-rater reliability was poor.

Conclusions: In their current form, AI-generated abstracts are nearly indistinguishable from human-written ones, posing challenges for consistent identification in foot and ankle surgery.

Level of evidence: IV.

Keywords: AI; Artificial intelligence; Ethics; Foot and ankle surgery; Scientific writing.