Objective: To assess ChatGPT's capability of grading postoperative complications using the Clavien-Dindo classification (CDC) via Artificial Intelligence (AI) with Natural Language Processing (NLP).
Background: The CDC standardizes grading of postoperative complications. However, consistent, and precise application in dynamic clinical settings is challenging. AI offers a potential solution for efficient automated grading.
Methods: ChatGPT's accuracy in defining the CDC, generating clinical examples, grading complications from existing scenarios, and interpreting complications from fictional clinical summaries, was tested.
Results: ChatGPT 4 precisely mirrored the CDC, outperforming version 3.5. In generating clinical examples, ChatGPT 4 showcased 99% agreement with minor errors in urinary catheterization. For single complications, it achieved 97% accuracy. ChatGPT was able to accurately extract, grade, and analyze complications from free text fictional discharge summaries. It demonstrated near perfect performance when confronted with real-world discharge summaries: comparison between the human and ChatGPT4 grading showed a κ value of 0.92 (95% CI 0.82-1) (P<0.001).
Conclusions: ChatGPT 4 demonstrates promising proficiency and accuracy in applying the CDC. In the future, AI has the potential to become the mainstay tool to accurately capture, extract, and analyze CDC data from clinical datasets.
Copyright © 2024 Wolters Kluwer Health, Inc. All rights reserved.