Learning to View: Decision Transformers for Active Object Detection

Ding, Wenhao; Majcherczyk, Nathalie; Deshpande, Mohit; Qi, Xuewei; Zhao, Ding; Madhivanan, Rajasimman; Sen, Arnie

Computer Science > Robotics

arXiv:2301.09544 (cs)

[Submitted on 23 Jan 2023]

Title:Learning to View: Decision Transformers for Active Object Detection

Authors:Wenhao Ding, Nathalie Majcherczyk, Mohit Deshpande, Xuewei Qi, Ding Zhao, Rajasimman Madhivanan, Arnie Sen

View PDF

Abstract:Active perception describes a broad class of techniques that couple planning and perception systems to move the robot in a way to give the robot more information about the environment. In most robotic systems, perception is typically independent of motion planning. For example, traditional object detection is passive: it operates only on the images it receives. However, we have a chance to improve the results if we allow planning to consume detection signals and move the robot to collect views that maximize the quality of the results. In this paper, we use reinforcement learning (RL) methods to control the robot in order to obtain images that maximize the detection quality. Specifically, we propose using a Decision Transformer with online fine-tuning, which first optimizes the policy with a pre-collected expert dataset and then improves the learned policy by exploring better solutions in the environment. We evaluate the performance of proposed method on an interactive dataset collected from an indoor scenario simulator. Experimental results demonstrate that our method outperforms all baselines, including expert policy and pure offline RL methods. We also provide exhaustive analyses of the reward distribution and observation space.

Comments:	Accepted to ICRA 2023
Subjects:	Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2301.09544 [cs.RO]
	(or arXiv:2301.09544v1 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2301.09544

Submission history

From: Wenhao Ding [view email]
[v1] Mon, 23 Jan 2023 17:00:48 UTC (6,165 KB)

Computer Science > Robotics

Title:Learning to View: Decision Transformers for Active Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Learning to View: Decision Transformers for Active Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators