Grounding Classical Task Planners via Vision-Language Models

Zhang, Xiaohan; Ding, Yan; Amiri, Saeid; Yang, Hao; Kaminski, Andy; Esselink, Chad; Zhang, Shiqi

Computer Science > Robotics

arXiv:2304.08587 (cs)

[Submitted on 17 Apr 2023 (v1), last revised 19 Jun 2023 (this version, v3)]

Title:Grounding Classical Task Planners via Vision-Language Models

Authors:Xiaohan Zhang, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

View PDF

Abstract:Classical planning systems have shown great advances in utilizing rule-based human knowledge to compute accurate plans for service robots, but they face challenges due to the strong assumptions of perfect perception and action executions. To tackle these challenges, one solution is to connect the symbolic states and actions generated by classical planners to the robot's sensory observations, thus closing the perception-action loop. This research proposes a visually-grounded planning framework, named TPVQA, which leverages Vision-Language Models (VLMs) to detect action failures and verify action affordances towards enabling successful plan execution. Results from quantitative experiments show that TPVQA surpasses competitive baselines from previous studies in task completion rate.

Comments:	ICRA Workshop on Robot Execution Failures and Failure Management Strategies, 2023
Subjects:	Robotics (cs.RO)
Cite as:	arXiv:2304.08587 [cs.RO]
	(or arXiv:2304.08587v3 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2304.08587

Submission history

From: Xiaohan Zhang [view email]
[v1] Mon, 17 Apr 2023 20:07:24 UTC (3,844 KB)
[v2] Tue, 13 Jun 2023 23:19:54 UTC (3,857 KB)
[v3] Mon, 19 Jun 2023 22:28:02 UTC (3,845 KB)

Computer Science > Robotics

Title:Grounding Classical Task Planners via Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Grounding Classical Task Planners via Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators