Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy

Markus Gräf; Johannes Knitza; Jan Leipe; Martin Krusche; Martin Welcker; Sebastian Kuhn; Johanna Mucke; Axel J Hueber; Johannes Hornig; Philipp Klemm; Stefan Kleinert; Peer Aries; Nicolas Vuillerme; David Simon; Arnd Kleyer; Georg Schett; Johanna Callhoff

doi:10.1007/s00296-022-05202-4

Comparison of physician and artificial intelligence-based symptom checker diagnostic accuracy

Rheumatol Int. 2022 Dec;42(12):2167-2176. doi: 10.1007/s00296-022-05202-4. Epub 2022 Sep 10.

Authors

Markus Gräf^#^{1

2}, Johannes Knitza^#^{3

4

5}, Jan Leipe⁶, Martin Krusche⁷, Martin Welcker⁸, Sebastian Kuhn⁹, Johanna Mucke¹⁰, Axel J Hueber^{1

11}, Johannes Hornig¹², Philipp Klemm¹³, Stefan Kleinert¹⁴, Peer Aries¹⁵, Nicolas Vuillerme^{16

17

18}, David Simon^{1

2}, Arnd Kleyer^{1

2}, Georg Schett^{1

2}, Johanna Callhoff^{19

20}

Affiliations

¹ Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.
² Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany.
³ Department of Internal Medicine 3, Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. [email protected].
⁴ Deutsches Zentrum Immuntherapie (DZI), Friedrich-Alexander-University Erlangen-Nürnberg and Universitätsklinikum Erlangen, Erlangen, Germany. [email protected].
⁵ Université Grenoble Alpes, AGEIS, Grenoble, France. [email protected].
⁶ Division of Rheumatology, Department of Medicine V, Medical Faculty Mannheim of the University, University Hospital Mannheim, Heidelberg, Germany.
⁷ Division of Rheumatology and Systemic Inflammatory Diseases, University Hospital Hamburg-Eppendorf (UKE), Hamburg, Germany.
⁸ Medizinisches Versorgungszentrum Für Rheumatologie Dr. M. Welcker GmbH, Planegg, Germany.
⁹ Department of Digital Medicine, Medical Faculty OWL, Bielefeld University, Bielefeld, Germany.
¹⁰ Policlinic and Hiller Research Unit for Rheumatology, Medical Faculty, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
¹¹ Division of Rheumatology, Klinikum Nürnberg, Paracelsus Medical University, Nuremberg, Germany.
¹² Rheumapraxis an Der Hase, Osnabrück, Germany.
¹³ Department of Rheumatology, Immunology, Osteology and Physical Medicine, Justus Liebig University Gießen, Campus Kerckhoff, Bad Nauheim, Germany.
¹⁴ Praxisgemeinschaft Rheumatologie-Nephrologie, Erlangen, Germany.
¹⁵ Immunologikum, Hamburg, Germany.
¹⁶ Université Grenoble Alpes, AGEIS, Grenoble, France.
¹⁷ Institut Universitaire de France, Paris, France.
¹⁸ LabCom Telecom4Health, Orange Labs & Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP-UGA, Grenoble, France.
¹⁹ Epidemiology Unit, German Rheumatism Research Centre, Berlin, Germany.
²⁰ Institute for Social Medicine, Epidemiology and Health Economics, Charité Universitätsmedizin, Berlin, Germany.

^# Contributed equally.

Abstract

Symptom checkers are increasingly used to assess new symptoms and navigate the health care system. The aim of this study was to compare the accuracy of an artificial intelligence (AI)-based symptom checker (Ada) and physicians regarding the presence/absence of an inflammatory rheumatic disease (IRD). In this survey study, German-speaking physicians with prior rheumatology working experience were asked to determine IRD presence/absence and suggest diagnoses for 20 different real-world patient vignettes, which included only basic health and symptom-related medical history. IRD detection rate and suggested diagnoses of participants and Ada were compared to the gold standard, the final rheumatologists' diagnosis, reported on the discharge summary report. A total of 132 vignettes were completed by 33 physicians (mean rheumatology working experience 8.8 (SD 7.1) years). Ada's diagnostic accuracy (IRD) was significantly higher compared to physicians (70 vs 54%, p = 0.002) according to top diagnosis. Ada listed the correct diagnosis more often compared to physicians (54 vs 32%, p < 0.001) as top diagnosis as well as among the top 3 diagnoses (59 vs 42%, p < 0.001). Work experience was not related to suggesting the correct diagnosis or IRD status. Confined to basic health and symptom-related medical history, the diagnostic accuracy of physicians was lower compared to an AI-based symptom checker. These results highlight the potential of using symptom checkers early during the patient journey and importance of access to complete and sufficient patient information to establish a correct diagnosis.

Keywords: Artificial intelligence; Diagnosis; Diagnostic decision support system; Rheumatology; Symptom checker; Telemedicine.

MeSH terms

Artificial Intelligence*
Humans
Rheumatologists
Rheumatology*
Surveys and Questionnaires