Generative artificial intelligence versus clinicians: Who diagnoses multiple sclerosis faster and with greater accuracy?

Mult Scler Relat Disord. 2024 Oct:90:105791. doi: 10.1016/j.msard.2024.105791. Epub 2024 Aug 6.

Abstract

Background: Those receiving the diagnosis of multiple sclerosis (MS) over the next ten years will predominantly be part of Generation Z (Gen Z). Recent observations within our clinic suggest that younger people with MS utilize online generative artificial intelligence (AI) platforms for personalized medical advice prior to their first visit with a specialist in neuroimmunology. The use of such platforms is anticipated to increase given the technology driven nature, desire for instant communication, and cost-conscious nature of Gen Z. Our objective was to determine if ChatGPT (Generative Pre-trained Transformer) could diagnose MS in individuals earlier than their clinical timeline, and to assess if the accuracy differed based on age, sex, and race/ethnicity.

Methods: People with MS between 18 and 59 years of age were studied. The clinical timeline for people diagnosed with MS was retrospectively identified and simulated using ChatGPT-3.5 (GPT-3.5). Chats were conducted using both actual and derivatives of their age, sex, and race/ethnicity to test diagnostic accuracy. A Kaplan-Meier survival curve was estimated for time to diagnosis, clustered by subject. The p-value testing for differences in time to diagnosis was accomplished using a general Wilcoxon test. Logistic regression (subject-specific intercept) was used to capture intra-subject correlation to test the accuracy prior to and after the inclusion of MRI data.

Results: The study cohort included 100 unique people with MS. Of those, 50 were members of Gen Z (38 female; 22 White; mean age at first symptom was 20.6 years (y) (standard deviation (SD)=2.2y)), and 50 were non-Gen Z (34 female; 27 White; mean age at first symptom was 37.0y (SD=10.4y)). In addition, a total of 529 people that represented digital simulations of the original cohort of 100 people (333 female; 166 White; 136 Black/African American; 107 Asian; 120 Hispanic, mean age at first symptom was 31.6y (SD=12.4y)) were generated allowing for 629 scripted conversations to be analyzed. The estimated median time to diagnosis in clinic was significantly longer at 0.35y (95% CI=[0.28, 0.48]) versus that by ChatGPT at 0.08y (95% CI=[0.04, 0.24]) (p<0.0001). There was no difference in the diagnostic accuracy between ages and by race/ethnicity prior to the inclusion of MRI data. However, prior to including the MRI data, males had a 47% less likely chance of a correct diagnosis relative to females (p=0.05). Post-MRI data inclusion within GPT-3.5, the odds of an accurate diagnosis was 4.0-fold greater for Gen Z participants, relative to non-Gen Z participants (p=0.01) with the diagnostic accuracy being 68% less in males relative to females (p=0.009), and 75% less for White subjects, relative to non-White subjects (p=0.0004).

Conclusion: Although generative AI platforms enable rapid information access and are not principally designed for use in healthcare, an increase in use by Gen Z is anticipated. However, the obtained responses may not be generalizable to all users and bias may exist in select groups.

Keywords: Artificial intelligence; ChatGPT; Diagnosis; Generative AI; Multiple sclerosis.

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Artificial Intelligence*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Multiple Sclerosis* / diagnosis
  • Retrospective Studies
  • Time Factors
  • Young Adult