Capability of multimodal large language models to interpret pediatric radiological images

Pediatr Radiol. 2024 Sep;54(10):1729-1737. doi: 10.1007/s00247-024-06025-0. Epub 2024 Aug 12.

Abstract

Background: There is a dearth of artificial intelligence (AI) development and research dedicated to pediatric radiology. The newest iterations of large language models (LLMs) like ChatGPT can process image and video input in addition to text. They are thus theoretically capable of providing impressions of input radiological images.

Objective: To assess the ability of multimodal LLMs to interpret pediatric radiological images.

Materials and methods: Thirty medically significant cases were collected and submitted to GPT-4 (OpenAI, San Francisco, CA), Gemini 1.5 Pro (Google, Mountain View, CA), and Claude 3 Opus (Anthropic, San Francisco, CA) with a short history for a total of 90 images. AI responses were recorded and independently assessed for accuracy by a resident and attending physician. 95% confidence intervals were determined using the adjusted Wald method.

Results: Overall, the models correctly diagnosed 27.8% (25/90) of images (95% CI=19.5-37.8%), were partially correct for 13.3% (12/90) of images (95% CI=2.7-26.4%), and were incorrect for 58.9% (53/90) of images (95% CI=48.6-68.5%).

Conclusion: Multimodal LLMs are not yet capable of interpreting pediatric radiological images.

Keywords: Artificial intelligence; Child; Imaging; Informatics; Language; Machine learning; Pediatrics; Radiology.

MeSH terms

  • Artificial Intelligence*
  • Child
  • Female
  • Humans
  • Male
  • Pediatrics / methods
  • Radiographic Image Interpretation, Computer-Assisted / methods
  • Radiology