Few-Shot Object Detection with Fully Cross-Transformer

Han, Guangxing; Ma, Jiawei; Huang, Shiyuan; Chen, Long; Chang, Shih-Fu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.15021 (cs)

[Submitted on 28 Mar 2022 (v1), last revised 29 Sep 2022 (this version, v2)]

Title:Few-Shot Object Detection with Fully Cross-Transformer

Authors:Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang

View PDF

Abstract:Few-shot object detection (FSOD), with the aim to detect novel objects using very few training examples, has recently attracted great research interest in the community. Metric-learning based methods have been demonstrated to be effective for this task using a two-branch based siamese network, and calculate the similarity between image regions and few-shot examples for detection. However, in previous works, the interaction between the two branches is only restricted in the detection head, while leaving the remaining hundreds of layers for separate feature extraction. Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head. The asymmetric-batched cross-attention is proposed to aggregate the key information from the two branches with different batch sizes. Our model can improve the few-shot similarity learning between the two branches by introducing the multi-level interactions. Comprehensive experiments on both PASCAL VOC and MSCOCO FSOD benchmarks demonstrate the effectiveness of our model.

Comments:	CVPR 2022 (Oral). Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Cite as:	arXiv:2203.15021 [cs.CV]
	(or arXiv:2203.15021v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.15021

Submission history

From: Guangxing Han [view email]
[v1] Mon, 28 Mar 2022 18:28:51 UTC (9,780 KB)
[v2] Thu, 29 Sep 2022 04:50:45 UTC (9,778 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Few-Shot Object Detection with Fully Cross-Transformer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Few-Shot Object Detection with Fully Cross-Transformer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators