Background: The rise of artificial intelligence (AI) in medicine has revealed the potential of ChatGPT as a pivotal tool in medical diagnosis and treatment. This study assesses the efficacy of ChatGPT versions 3.5 and 4.0 in addressing renal cell carcinoma (RCC) clinical inquiries. Notably, fine-tuning and iterative optimization of the model corrected ChatGPT's limitations in this area.
Methods: In our study, 80 RCC-related clinical questions from urology experts were posed three times to both ChatGPT 3.5 and ChatGPT 4.0, seeking binary (yes/no) responses. We then statistically analyzed the answers. Finally, we fine-tuned the GPT-3.5 Turbo model using these questions, and assessed its training outcomes.
Results: We found that the average accuracy rates of answers provided by ChatGPT versions 3.5 and 4.0 were 67.08% and 77.50%, respectively. ChatGPT 4.0 outperformed ChatGPT 3.5, with a higher accuracy rate in responses (p < 0.05). By counting the number of correct responses to the 80 questions, we then found that although ChatGPT 4.0 performed better (p < 0.05), both versions were subject to instability in answering. Finally, by fine-tuning the GPT-3.5 Turbo model, we found that the correct rate of responses to these questions could be stabilized at 93.75%. Iterative optimization of the model can result in 100% response accuracy.
Conclusion: We compared ChatGPT versions 3.5 and 4.0 in addressing clinical RCC questions, identifying their limitations. By applying the GPT-3.5 Turbo fine-tuned model iterative training method, we enhanced AI strategies in renal oncology. This approach is set to enhance ChatGPT's database and clinical guidance capabilities, optimizing AI in this field.
Keywords: Artificial intelligence; Chat generative pre-trained transformer; Fine-tuned model; GPT-3.5 Turbo; Renal cell carcinoma.
© 2024. Society of Surgical Oncology.