Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Xing, Xin; Xiong, Zhexiao; Stylianou, Abby; Sastry, Srikumar; Gong, Liyu; Jacobs, Nathan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.15985 (cs)

[Submitted on 24 Oct 2023]

Title:Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Authors:Xin Xing, Zhexiao Xiong, Abby Stylianou, Srikumar Sastry, Liyu Gong, Nathan Jacobs

View PDF

Abstract:This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class image classification, where the task is predicting a single label from many possible labels for an image. Single-Positive Multi-label Learning (SPML) specifically considers learning to predict multiple labels when there is only a single annotation per image in the training data. Multi-label learning is in many ways a more realistic task than single-label learning as real-world data often involves instances belonging to multiple categories simultaneously; however, most common computer vision datasets predominantly contain single labels due to the inherent complexity and cost of collecting multiple high quality annotations for each instance. We propose a novel approach called Vision-Language Pseudo-Labeling (VLPL), which uses a vision-language model to suggest strong positive and negative pseudo-labels, and outperforms the current SOTA methods by 5.5% on Pascal VOC, 18.4% on MS-COCO, 15.2% on NUS-WIDE, and 8.4% on CUB-Birds. Our code and data are available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.15985 [cs.CV]
	(or arXiv:2310.15985v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.15985

Submission history

From: Xin Xing [view email]
[v1] Tue, 24 Oct 2023 16:36:51 UTC (782 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators