Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software

J Am Acad Orthop Surg. 2025 Jan 1;33(1):42-50. doi: 10.5435/JAAOS-D-24-00084. Epub 2024 Nov 19.

Abstract

Introduction: The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?

Methods: PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors. Twenty articles published by each journal before the release of ChatGPT were randomly selected as negative control articles. 36 positive control articles (6 per journal) were created by altering 25%, 50%, and 100% of text from negative control articles using ChatGPT and were then used to validate each detector. The mean percentage of text detected as written by AI per detector was compared between pre-ChatGPT and post-ChatGPT release articles using independent t -test. Multivariate regression analysis was conducted using percentage AI-generated text per journal, article type (ie, cohort, clinical trial, review), and month of submission.

Results: One AI detector consistently and accurately identified AI-generated text in positive control articles, whereas two others showed poor sensitivity and specificity. The most accurate detector showed a modest increase in the percentage AI detected for the articles received post release of ChatGPT (+1.8%, P = 0.01). Regression analysis showed no consistent associations between likelihood of AI-generated text per journal, article type, or month of submission.

Conclusions: As this study found an early, albeit modest, effect of generative AI on the orthopaedic literature, proper oversight will play a critical role in maintaining research integrity and accuracy. AI detectors may play a critical role in regulatory efforts, although they will require further development and standardization to the interpretation of their results.

MeSH terms

  • Artificial Intelligence*
  • Humans
  • Orthopedics*
  • Periodicals as Topic
  • Software*
  • Writing