Background: Free chatbots powered by large language models offer lateral ankle sprains (LAS) treatment recommendations but lack scientific validation.
Methods: The chatbots-Claude, Perplexity, and ChatGPT-were evaluated by comparing their responses to a questionnaire and their treatment algorithms against current clinical guidelines. Responses were graded on accuracy, conclusiveness, supplementary information, and incompleteness, and evaluated individually and collectively, with a 60 % pass threshold.
Results: The collective analysis of the questionnaire showed Perplexity scored significantly higher than Claude and ChatGPT (p < 0.001). In the individual analysis, Perplexity provided significantly more supplementary information than the other chatbots (p < 0.001). All chatbots met the pass threshold. In the algorithm evaluation, ChatGPT scored significantly higher than the others (p = 0.023), with Perplexity below the pass threshold.
Conclusions: Chatbots' recommendations generally aligned with current guidelines but sometimes missed crucial details. While they offer useful supplementary information, they cannot yet replace professional medical consultation or established guidelines.
Keywords: ChatGPT; Claude; Lateral ankle sprains; Perplexity; artificial intelligence (AI); chatbots; treatment recommendations.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.