Reasoning-Enhanced Object-Centric Learning for Videos

Li, Jian; Ren, Pu; Liu, Yang; Sun, Hao

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.15245 (cs)

[Submitted on 22 Mar 2024]

Title:Reasoning-Enhanced Object-Centric Learning for Videos

Authors:Jian Li, Pu Ren, Yang Liu, Hao Sun

View PDF HTML (experimental)

Abstract:Object-centric learning aims to break down complex visual scenes into more manageable object representations, enhancing the understanding and reasoning abilities of machine learning systems toward the physical world. Recently, slot-based video models have demonstrated remarkable proficiency in segmenting and tracking objects, but they overlook the importance of the effective reasoning module. In the real world, reasoning and predictive abilities play a crucial role in human perception and object tracking; in particular, these abilities are closely related to human intuitive physics. Inspired by this, we designed a novel reasoning module called the Slot-based Time-Space Transformer with Memory buffer (STATM) to enhance the model's perception ability in complex scenes. The memory buffer primarily serves as storage for slot information from upstream modules, the Slot-based Time-Space Transformer makes predictions through slot-based spatiotemporal attention computations and fusion. Our experiment results on various datasets show that STATM can significantly enhance object-centric learning capabilities of slot-based video models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2403.15245 [cs.CV]
	(or arXiv:2403.15245v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.15245

Submission history

From: Jian Li [view email]
[v1] Fri, 22 Mar 2024 14:41:55 UTC (6,977 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Reasoning-Enhanced Object-Centric Learning for Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Reasoning-Enhanced Object-Centric Learning for Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators