C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Jacq, Alexis; Orsini, Manu; Dulac-Arnold, Gabriel; Pietquin, Olivier; Geist, Matthieu; Bachem, Olivier

Computer Science > Artificial Intelligence

arXiv:2211.03521v1 (cs)

[Submitted on 7 Nov 2022 (this version), latest version 20 Feb 2023 (v2)]

Title:C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Authors:Alexis Jacq, Manu Orsini, Gabriel Dulac-Arnold, Olivier Pietquin, Matthieu Geist, Olivier Bachem

View PDF

Abstract:Given a particular embodiment, we propose a novel method (C3PO) that learns policies able to achieve any arbitrary position and pose. Such a policy would allow for easier control, and would be re-useable as a key building block for downstream tasks. The method is two-fold: First, we introduce a novel exploration algorithm that optimizes for uniform coverage, is able to discover a set of achievable states, and investigates its abilities in attaining both high coverage, and hard-to-discover states; Second, we leverage this set of achievable states as training data for a universal goal-achievement policy, a goal-based SAC variant. We demonstrate the trained policy's performance in achieving a large number of novel states. Finally, we showcase the influence of massive unsupervised training of a goal-achievement policy with state-of-the-art pose-based control of the Hopper, Walker, Halfcheetah, Humanoid and Ant embodiments.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2211.03521 [cs.AI]
	(or arXiv:2211.03521v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2211.03521

Submission history

From: Alexis Jacq [view email]
[v1] Mon, 7 Nov 2022 13:02:40 UTC (3,981 KB)
[v2] Mon, 20 Feb 2023 14:28:14 UTC (4,223 KB)

Computer Science > Artificial Intelligence

Title:C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators