Assessing large multimodal models for one-shot learning and interpretability in biomedical image classification

Wenpin Hou; Yilong Qu; Zhicheng Ji

doi:10.1101/2023.12.31.573796

Assessing large multimodal models for one-shot learning and interpretability in biomedical image classification

bioRxiv [Preprint]. 2024 Oct 8:2023.12.31.573796. doi: 10.1101/2023.12.31.573796.

Authors

Wenpin Hou¹, Yilong Qu², Zhicheng Ji²

Affiliations

¹ Department of Biostatistics, The Mailman School of Public Health, Columbia University, New York City, NY, USA.
² Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA.

Abstract

Image classification plays a pivotal role in analyzing biomedical images, serving as a cornerstone for both biological research and clinical diagnostics. We demonstrate that large multimodal models (LMMs), like GPT-4, excel in one-shot learning, generalization, interpretability, and text-driven image classification across diverse biomedical tasks. These tasks include the classification of tissues, cell types, cellular states, and disease status. LMMs stand out from traditional single-modal classification approaches, which often require large training datasets and offer limited interpretability.

Publication types

Preprint

Abstract

Publication types

Grants and funding