Enhanced Artificial Intelligence Strategies in Renal Oncology: Iterative Optimization and Comparative Analysis of GPT 3.5 Versus 4.0

Ann Surg Oncol. 2024 Jun;31(6):3887-3893. doi: 10.1245/s10434-024-15107-0. Epub 2024 Mar 12.

Abstract

Background: The rise of artificial intelligence (AI) in medicine has revealed the potential of ChatGPT as a pivotal tool in medical diagnosis and treatment. This study assesses the efficacy of ChatGPT versions 3.5 and 4.0 in addressing renal cell carcinoma (RCC) clinical inquiries. Notably, fine-tuning and iterative optimization of the model corrected ChatGPT's limitations in this area.

Methods: In our study, 80 RCC-related clinical questions from urology experts were posed three times to both ChatGPT 3.5 and ChatGPT 4.0, seeking binary (yes/no) responses. We then statistically analyzed the answers. Finally, we fine-tuned the GPT-3.5 Turbo model using these questions, and assessed its training outcomes.

Results: We found that the average accuracy rates of answers provided by ChatGPT versions 3.5 and 4.0 were 67.08% and 77.50%, respectively. ChatGPT 4.0 outperformed ChatGPT 3.5, with a higher accuracy rate in responses (p < 0.05). By counting the number of correct responses to the 80 questions, we then found that although ChatGPT 4.0 performed better (p < 0.05), both versions were subject to instability in answering. Finally, by fine-tuning the GPT-3.5 Turbo model, we found that the correct rate of responses to these questions could be stabilized at 93.75%. Iterative optimization of the model can result in 100% response accuracy.

Conclusion: We compared ChatGPT versions 3.5 and 4.0 in addressing clinical RCC questions, identifying their limitations. By applying the GPT-3.5 Turbo fine-tuned model iterative training method, we enhanced AI strategies in renal oncology. This approach is set to enhance ChatGPT's database and clinical guidance capabilities, optimizing AI in this field.

Keywords: Artificial intelligence; Chat generative pre-trained transformer; Fine-tuned model; GPT-3.5 Turbo; Renal cell carcinoma.

Publication types

  • Comparative Study

MeSH terms

  • Artificial Intelligence*
  • Carcinoma, Renal Cell* / pathology
  • Humans
  • Kidney Neoplasms* / pathology
  • Prognosis