In-context learning enables multimodal large language models to classify cancer pathology images

Dyke Ferber; Georg Wölflein; Isabella C Wiest; Marta Ligero; Srividhya Sainath; Narmin Ghaffari Laleh; Omar S M El Nahhas; Gustav Müller-Franzes; Dirk Jäger; Daniel Truhn; Jakob Nikolas Kather

doi:10.1038/s41467-024-51465-9

In-context learning enables multimodal large language models to classify cancer pathology images

Nat Commun. 2024 Nov 21;15(1):10104. doi: 10.1038/s41467-024-51465-9.

Authors

Dyke Ferber^{1

2

3}, Georg Wölflein⁴, Isabella C Wiest^{3

5}, Marta Ligero³, Srividhya Sainath³, Narmin Ghaffari Laleh³, Omar S M El Nahhas³, Gustav Müller-Franzes⁶, Dirk Jäger^{1

2}, Daniel Truhn⁶, Jakob Nikolas Kather^{7

8

9

10}

Affiliations

¹ National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany.
² Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany.
³ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
⁴ School of Computer Science, University of St Andrews, St Andrews, UK.
⁵ Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
⁶ Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany.
⁷ National Center for Tumor Diseases (NCT), Heidelberg University Hospital, Heidelberg, Germany. [email protected].
⁸ Department of Medical Oncology, Heidelberg University Hospital, Heidelberg, Germany. [email protected].
⁹ Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany. [email protected].
¹⁰ Department of Medicine I, University Hospital Dresden, Dresden, Germany. [email protected].

Abstract

Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.

MeSH terms

Algorithms
Breast Neoplasms* / diagnostic imaging
Breast Neoplasms* / pathology
Colonic Polyps / diagnostic imaging
Colonic Polyps / pathology
Colorectal Neoplasms* / diagnostic imaging
Colorectal Neoplasms* / pathology
Deep Learning*
Female
Humans
Image Processing, Computer-Assisted* / methods
Neoplasms / diagnostic imaging
Neoplasms / pathology
Neural Networks, Computer