SFMViT: SlowFast Meet ViT in Chaotic World

Lin, Jiaying; Wen, Jiajun; Liu, Mengyuan; Liu, Jinfu; Yin, Baiqiao; Li, Yue

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.16609 (cs)

[Submitted on 25 Apr 2024 (v1), last revised 13 Aug 2024 (this version, v2)]

Title:SFMViT: SlowFast Meet ViT in Chaotic World

Authors:Jiaying Lin, Jiajun Wen, Mengyuan Liu, Jinfu Liu, Baiqiao Yin, Yue Li

View PDF HTML (experimental)

Abstract:The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic World dataset, far exceeding existing models. Code is available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.16609 [cs.CV]
	(or arXiv:2404.16609v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.16609

Submission history

From: Jiaying Lin [view email]
[v1] Thu, 25 Apr 2024 13:49:42 UTC (42,745 KB)
[v2] Tue, 13 Aug 2024 03:13:50 UTC (21,133 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SFMViT: SlowFast Meet ViT in Chaotic World

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SFMViT: SlowFast Meet ViT in Chaotic World

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators