Accuracy and reproducibility of ChatGPT's free version answers about endometriosis

Bahar Yuksel Ozgor; Melek Azade Simavi

doi:10.1002/ijgo.15309

Accuracy and reproducibility of ChatGPT's free version answers about endometriosis

Int J Gynaecol Obstet. 2024 May;165(2):691-695. doi: 10.1002/ijgo.15309. Epub 2023 Dec 18.

Authors

Bahar Yuksel Ozgor^{1

2}, Melek Azade Simavi²

Affiliations

¹ Department of Obstetrics and Gynecology, Biruni University, Istanbul, Turkey.
² Endometriosis Research and Support Organization (Endo Türkiye), Istanbul, Turkey.

PMID: 38108232
DOI: 10.1002/ijgo.15309

Abstract

Objective: To evaluate the accuracy and reproducibility of ChatGPT's free version answers about endometriosis for the first time.

Methods: Detailed internet searches to identify frequently asked questions (FAQs) about endometriosis have been performed. Scientific questions were prepared in accordance with the European Society of Human Reproduction and Embryology (ESHRE) endometriosis guidelines. An experienced gynecologist gave a score of 1-4 for each ChatGPT answer. The repeatability of ChatGPT answers about endometriosis was analyzed by asking each question twice, and the reproducibility of ChatGPT was accepted as scoring the answer to the same question in the same score category.

Results: A total of 91.4% (n = 71) of all FAQs were answered completely, accurately, and sufficiently. ChatGPT had the highest accuracy in the symptom and diagnosis category (94.1%, 16/17 questions) and the lowest accuracy in the treatment category (81.3%, 13/16 questions). Furthermore, of the 40 questions based on the ESHRE endometriosis guidelines, 27 (67.5%) were classified as grade 1, seven (17.5%) as grade 2, and six (15.0%) as grade 3. The reproducibility rate of FAQs in the prevention, symptoms, and diagnosis, and complications categories was the highest (100% for all categories). The reproducibility rate was the lowest for questions based on the ESHRE endometriosis guidelines (70.0%).

Conclusion: ChatGPT accurately and satisfactorily responded to more than 90% of the questions about endometriosis, but to only 67.5% of questions based on the ESHRE endometriosis guidelines.

Keywords: ChatGPT; artificial intelligence; endometriosis.

MeSH terms

Endometriosis* / diagnosis
Female
Humans
Internet
Reproducibility of Results