A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Theodore Zhao; Yu Gu; Jianwei Yang; Naoto Usuyama; Ho Hin Lee; Sid Kiblawi; Tristan Naumann; Jianfeng Gao; Angela Crabtree; Jacob Abel; Christine Moung-Wen; Brian Piening; Carlo Bifulco; Mu Wei; Hoifung Poon; Sheng Wang

doi:10.1038/s41592-024-02499-w

A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities

Nat Methods. 2024 Nov 18. doi: 10.1038/s41592-024-02499-w. Online ahead of print.

Authors

Theodore Zhao^#¹, Yu Gu^#¹, Jianwei Yang¹, Naoto Usuyama¹, Ho Hin Lee¹, Sid Kiblawi¹, Tristan Naumann¹, Jianfeng Gao¹, Angela Crabtree², Jacob Abel³, Christine Moung-Wen³, Brian Piening^{3

2}, Carlo Bifulco^{3

2}, Mu Wei⁴, Hoifung Poon⁵, Sheng Wang^{6

7}

Affiliations

¹ Microsoft Research, Redmond, WA, USA.
² Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA.
³ Providence Genomics, Portland, OR, USA.
⁴ Microsoft Research, Redmond, WA, USA. [email protected].
⁵ Microsoft Research, Redmond, WA, USA. [email protected].
⁶ Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. [email protected].
⁷ Department of Surgery, University of Washington, Seattle, WA, USA. [email protected].

^# Contributed equally.

PMID: 39558098
DOI: 10.1038/s41592-024-02499-w

Abstract

Biomedical image analysis is fundamental for biomedical discovery. Holistic image analysis comprises interdependent subtasks such as segmentation, detection and recognition, which are tackled separately by traditional approaches. Here, we propose BiomedParse, a biomedical foundation model that can jointly conduct segmentation, detection and recognition across nine imaging modalities. This joint learning improves the accuracy for individual tasks and enables new applications such as segmenting all relevant objects in an image through a textual description. To train BiomedParse, we created a large dataset comprising over 6 million triples of image, segmentation mask and textual description by leveraging natural language labels or descriptions accompanying existing datasets. We showed that BiomedParse outperformed existing methods on image segmentation across nine imaging modalities, with larger improvement on objects with irregular shapes. We further showed that BiomedParse can simultaneously segment and label all objects in an image. In summary, BiomedParse is an all-in-one tool for biomedical image analysis on all major image modalities, paving the path for efficient and accurate image-based biomedical discovery.