Assessing the quality of AI information from ChatGPT regarding oral surgery, preventive dentistry, and oral cancer: An exploration study

Arwa A Alsayed; Mariam B Aldajani; Marwan H Aljohani; Hamdan Alamri; Maram A Alwadi; Bodor Z Alshammari; Falah R Alshammari

doi:10.1016/j.sdentj.2024.09.009

Assessing the quality of AI information from ChatGPT regarding oral surgery, preventive dentistry, and oral cancer: An exploration study

Saudi Dent J. 2024 Nov;36(11):1483-1489. doi: 10.1016/j.sdentj.2024.09.009. Epub 2024 Sep 12.

Authors

Arwa A Alsayed¹, Mariam B Aldajani², Marwan H Aljohani³, Hamdan Alamri⁴, Maram A Alwadi⁵, Bodor Z Alshammari⁶, Falah R Alshammari⁷

Affiliations

¹ Sijam Dental Centre, Riyadh, Saudi Arabia.
² Department of Paediatric Dentistry, Faculty of Dentistry, King Abdulaziz University, Jeddah, Saudi Arabia.
³ Oral and Maxillofacial Diagnostic Sciences, College of Dentistry, Taibah University, Madinah city, Saudi Arabia.
⁴ Department of Preventive Dentistry, College of Dentistry, Majmaah University, Al Majmaah, Saudi Arabia.
⁵ Department of Dental Health, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia.
⁶ Ministry of Health, Qassim, Saudi Arabia.
⁷ College of Dentistry, University of Ha'il, Ha'il city, Saudi Arabia.

Abstract

Aim: Evaluation of the quality of dental information produced by the ChatGPT artificial intelligence language model within the context of oral surgery, preventive dentistry, and oral cancer.

Methodology: This study adopted quantitative methods approach. The experts prepared 50 questions (including dimensions of, risk factors, preventive measures, diagnostic methods, and treatment options) that would be presented to ChatGPT, and its responses were rated for their accuracy, completeness, relevance, clarity or comprehensibility, and possible risks using a standardized rubric. To carry out the assessment of the responses by ChatGPT, a standardized scoring rubric was used. Evaluation process included feedback concerning the strengths, weaknesses, and potential areas of improvement in the responses provided by ChatGPT.

Results: While achieving the highest score for preventive dentistry at 4.3/5 and being able to communicate the complex information coherently, the tool showed lower accuracy for oral surgery and oral cancer, scoring 3.9/5 and 3.6/5, respectively, with several gaps for post-operative instructions, personalized risk assessments, and specialized diagnostic methods. Potential risks, such as a lack of individualized advice, were shown in 53% of the oral cancer and in 40% of the oral surgery. While showing promise in some domains, ChatGPT had important limitations in specialized areas that require nuanced expertise.

Conclusion: The findings point to the need for professional supervision while using AI-generated information and ongoing evaluation as capabilities evolve, for the assurance of responsible implementation in the best interest of patient care.

Keywords: ChatGPT; Oral cancer; Oral surgery; Preventive dentistry.