Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Ni, Minheng; Zhang, Yabo; Feng, Kailai; Li, Xiaoming; Guo, Yiwen; Zuo, Wangmeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.16777 (cs)

[Submitted on 31 Aug 2023 (v1), last revised 1 Sep 2023 (this version, v2)]

Title:Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Authors:Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, Wangmeng Zuo

View PDF

Abstract:Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly focus on using pre-trained discriminative models (e.g., CLIP). However, we have observed that generative models (e.g., Stable Diffusion) have potentially understood the relationships between various visual elements and text descriptions, which are rarely investigated in this task. In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models. We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models. When we combine both generative and discriminative models, our Ref-Diff outperforms these competing methods by a significant margin. This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation. Our code is publicly available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.16777 [cs.CV]
	(or arXiv:2308.16777v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.16777

Submission history

From: Minheng Ni [view email]
[v1] Thu, 31 Aug 2023 14:55:30 UTC (3,811 KB)
[v2] Fri, 1 Sep 2023 05:57:47 UTC (3,811 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators