Accuracy and Consistency of Gemini Responses Regarding the Management of Traumatized Permanent Teeth

Dent Traumatol. 2024 Oct 26. doi: 10.1111/edt.13004. Online ahead of print.

Abstract

Background: The aim of this cross-sectional observational analytical study was to assess the accuracy and consistency of responses provided by Google Gemini (GG), a free-access high-performance multimodal large language model, to questions related to the European Society of Endodontology position statement on the management of traumatized permanent teeth (MTPT).

Materials and methods: Three academic endodontists developed a set of 99 yes/no questions covering all areas of the MTPT. Nine general dentists and 22 endodontic specialists evaluated these questions for clarity and comprehension through an iterative process. Two academic dental trauma experts categorized the knowledge required to answer each question into three levels. The three academic endodontists submitted the 99 questions to the GG, resulting in 297 responses, which were then assessed for accuracy and consistency. Accuracy was evaluated using the Wald binomial method, while the consistency of GG responses was assessed using the kappa-Fleiss coefficient with a confidence interval of 95%. A 5% significance level chi-squared test was used to evaluate the influence of question level of knowledge on accuracy and consistency.

Results: The responses generated by Gemini showed an overall moderate accuracy of 80.81%, with no significant differences found between the responses of the academic endodontists. Overall, high consistency (95.96%) was demonstrated, with no significant differences between GG responses across the three accounts. The analysis also revealed no correlation between question level of knowledge and accuracy or consistency, with no significant differences.

Conclusions: The results of this study could significantly impact the potential use of Gemini as a free-access source of information for clinicians in the MTPT.

Keywords: Google Gemini; artificial intelligence; dental trauma; endodontic management; large language models; tooth injuries.