TubeR: Tube-Transformer for Action Detection

Zhao, Jiaojiao; Li, Arthur; Liu, Chunhui; Bing, Shuai; Chen, Hao; Snoek, Cees G. M.; Tighe, Joseph

Computer Science > Computer Vision and Pattern Recognition

arXiv:2104.00969v1 (cs)

[Submitted on 2 Apr 2021 (this version), latest version 10 May 2022 (v5)]

Title:TubeR: Tube-Transformer for Action Detection

Authors:Jiaojiao Zhao, Arthur Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees G.M. Snoek, Joseph Tighe

View PDF

Abstract:In this paper, we propose TubeR: the first transformer based network for end-to-end action detection, with an encoder and decoder optimized for modeling action tubes with variable lengths and aspect ratios. TubeR does not rely on hand-designed tube structures, automatically links predicted action boxes over time and learns a set of tube queries related to actions. By learning action tube embeddings, TubeR predicts more precise action tubes with flexible spatial and temporal extents. Our experiments demonstrate TubeR achieves state-of-the-art among single-stream methods on UCF101-24 and J-HMDB. TubeR outperforms existing one-model methods on AVA and is even competitive with the two-model methods. Moreover, we observe TubeR has the potential on tracking actors with different actions, which will foster future research in long-range video understanding.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2104.00969 [cs.CV]
	(or arXiv:2104.00969v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2104.00969

Submission history

From: Jiaojiao Zhao [view email]
[v1] Fri, 2 Apr 2021 10:21:22 UTC (45,644 KB)
[v2] Fri, 9 Apr 2021 12:22:14 UTC (45,644 KB)
[v3] Mon, 6 Dec 2021 09:19:47 UTC (49,011 KB)
[v4] Fri, 15 Apr 2022 12:42:21 UTC (49,093 KB)
[v5] Tue, 10 May 2022 07:39:03 UTC (49,093 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jiaojiao Zhao
Xinyu Li
Chunhui Liu
Bing Shuai
Hao Chen

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:TubeR: Tube-Transformer for Action Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TubeR: Tube-Transformer for Action Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators