Three ways to improve feature alignment for open vocabulary detection

Arandjelović, Relja; Andonian, Alex; Mensch, Arthur; Hénaff, Olivier J.; Alayrac, Jean-Baptiste; Zisserman, Andrew

Computer Science > Computer Vision and Pattern Recognition

arXiv:2303.13518 (cs)

[Submitted on 23 Mar 2023]

Title:Three ways to improve feature alignment for open vocabulary detection

Authors:Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman

View PDF

Abstract:The core problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes. Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining, and struggles to prevent the language model from forgetting unseen classes.
We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training, while simultaneously saving memory and computation. Secondly, the feature pyramid network and the detection head are modified to include trainable gated shortcuts, which encourages vision-text feature alignment and guarantees it at the start of detection training. Finally, a self-training approach is used to leverage a larger corpus of image-text pairs thus improving detection performance on classes with no human annotated bounding boxes.
Our three methods are evaluated on the zero-shot version of the LVIS benchmark, each of them showing clear and significant benefits. Our final network achieves the new stateof-the-art on the mAP-all metric and demonstrates competitive performance for mAP-rare, as well as superior transfer to COCO and Objects365.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2303.13518 [cs.CV]
	(or arXiv:2303.13518v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2303.13518

Submission history

From: Relja Arandjelović [view email]
[v1] Thu, 23 Mar 2023 17:59:53 UTC (24,045 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Three ways to improve feature alignment for open vocabulary detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Three ways to improve feature alignment for open vocabulary detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators