The role of generative language systems in increasing patient awareness of colon cancer screening

Marcello Maida; Daryl Ramai; Yuichi Mori; Mário Dinis-Ribeiro; Antonio Facciorusso; Cesare Hassan; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group

doi:10.1055/a-2388-6084

The role of generative language systems in increasing patient awareness of colon cancer screening

Endoscopy. 2024 Oct 23. doi: 10.1055/a-2388-6084. Online ahead of print.

Authors

Marcello Maida¹, Daryl Ramai², Yuichi Mori^{3

4}, Mário Dinis-Ribeiro^{5

6}, Antonio Facciorusso⁷, Cesare Hassan^{8

9}; and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group

Collaborators

and the AI-CORE (Artificial Intelligence COlorectal cancer Research) Working Group:
Marianna Arvanitakis¹⁰, Monique T Barakat¹¹, Raf Bisschops¹², Kathryn Byrne², Evelien Dekker¹³, John Fang², Lorenzo Fuccio¹⁴, Paraskevas Gkolfakis¹⁵, Jonathan A Leighton¹⁶, Vincente Lorenzo-Zúñiga¹⁷, David J Hass¹⁸, Carlo D Maida¹⁹, Andrew Ofosu²⁰, Amy S Oxentenko¹⁶, Franco Radaelli²¹, Luigi Ricciardiello²², Matthew D Rutter²³, Jessica Stout², Konstantinos Triantafyllou²⁴

Affiliations

¹ Department of Medicine and Surgery, University of Enna "Kore," Enna, Italy.
² Division of Gastroenterology and Hepatology, University of Utah Health, Salt Lake City, Utah, USA.
³ Clinical Effectiveness Research Group, University of Oslo, Oslo, Norway.
⁴ Digestive Disease Center, Showa University Northern Yokohama Hospital, Yokohama, Japan.
⁵ Porto Comprehensive Cancer Center & RISE@CI-IPO, University of Porto, Porto, Portugal.
⁶ Gastroenterology Department, Portuguese Institute of Oncology of Porto, Porto, Portugal.
⁷ Gastroenterology Unit, Department of Medical Sciences, University of Foggia, Foggia, Italy.
⁸ Endoscopy Unit, Humanitas Clinical and Research Hospital, IRCCS, Rozzano, Italy.
⁹ Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy.
¹⁰ Department of Gastroenterology, Hepatopancreatology and Digestive Oncology, Erasme University Hospital, Université Libre de Bruxelles, Brussels, Belgium.
¹¹ Division of Gastroenterology and Hepatology, Stanford University Medical Center, Stanford, California, USA.
¹² Department of Gastroenterology and Hepatology, KU Leuven University Hospitals Leuven Gasthuisberg Campus, Leuven, Belgium.
¹³ Department of Gastroenterology and Hepatology, Amsterdam University Medical Centers, location Academic Medical Center, Amsterdam, The Netherlands.
¹⁴ IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy.
¹⁵ Department of Gastroenterology, General Hospital of Nea Ionia "Konstantopoulio-Patision," Athens, Greece.
¹⁶ Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, Minnesota, USA.
¹⁷ Department of Gastroenterology and Endoscopy Unit IIS La Fe, Hospital Universitari i Politecnic La Fe, Valencia, Spain.
¹⁸ Division of Gastroenterology, Yale New Haven Hospital, Hamden, Connecticut, USA.
¹⁹ Division of Internal Medicine, S. Elia Hospital, Caltanissetta, Italy.
²⁰ Division of Digestive Diseases, University of Cincinnati, Cincinnati, Ohio, USA.
²¹ Gastroenterology Unit, Valduce Hospital, Como, Italy.
²² Gastroenterology, Hepatology and Nutrition Department, The University of Texas at MD Anderson Cancer Center, Houston, Texas, United States.
²³ Hepatogastroenterology Unit, 2nd Department of Propaedeutic Internal Medicine, Medical School, National and Kapodistrian University of Athens, Attikon University General Hospital, Athens, Greece.
²⁴ Newcastle University, Newcastle upon Tyne, UK.

PMID: 39142348
DOI: 10.1055/a-2388-6084

Abstract

Background: This study aimed to evaluate the effectiveness of ChatGPT (Chat Generative Pretrained Transformer) in answering patients' questions about colorectal cancer (CRC) screening, with the ultimate goal of enhancing patients' awareness and adherence to national screening programs.

Methods: 15 questions on CRC screening were posed to ChatGPT4. The answers were rated by 20 gastroenterology experts and 20 nonexperts in three domains (accuracy, completeness, and comprehensibility), and by 100 patients in three dichotomic domains (completeness, comprehensibility, and trustability).

Results: According to expert rating, the mean (SD) accuracy score was 4.8 (1.1), on a scale ranging from 1 to 6. The mean (SD) scores for completeness and comprehensibility were 2.1 (0.7) and 2.8 (0.4), respectively, on scales ranging from 1 to 3. Overall, the mean (SD) accuracy (4.8 [1.1] vs. 5.6 [0.7]; P < 0.001) and completeness scores (2.1 [0.7] vs. 2.7 [0.4]; P < 0.001) were significantly lower for the experts than for the nonexperts, while comprehensibility was comparable among the two groups (2.8 [0.4] vs. 2.8 [0.3]; P = 0.55). Patients rated all questions as complete, comprehensible, and trustable in between 97 % and 100 % of cases.

Conclusions: ChatGPT shows good performance, with the potential to enhance awareness about CRC and improve screening outcomes. Generative language systems may be further improved after proper training in accordance with scientific evidence and current guidelines.