-
Genie: Generative Interactive Environments
Authors:
Jake Bruce,
Michael Dennis,
Ashley Edwards,
Jack Parker-Holder,
Yuge Shi,
Edward Hughes,
Matthew Lai,
Aditi Mavalankar,
Richie Steigerwald,
Chris Apps,
Yusuf Aytar,
Sarah Bechtle,
Feryal Behbahani,
Stephanie Chan,
Nicolas Heess,
Lucy Gonzalez,
Simon Osindero,
Sherjil Ozair,
Scott Reed,
Jingwei Zhang,
Konrad Zolna,
Jeff Clune,
Nando de Freitas,
Satinder Singh,
Tim Rocktäschel
Abstract:
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem…
▽ More
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Vision-Language Models as a Source of Rewards
Authors:
Kate Baumli,
Satinder Baveja,
Feryal Behbahani,
Harris Chan,
Gheorghe Comanici,
Sebastian Flennerhag,
Maxime Gazeau,
Kristian Holsheimer,
Dan Horgan,
Michael Laskin,
Clare Lyle,
Hussain Masoom,
Kay McKinney,
Volodymyr Mnih,
Alexander Neitz,
Dmitry Nikulin,
Fabio Pardo,
Jack Parker-Holder,
John Quan,
Tim Rocktäschel,
Himanshu Sahni,
Tom Schaul,
Yannick Schroecker,
Stephen Spencer,
Richie Steigerwald
, et al. (2 additional authors not shown)
Abstract:
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of…
▽ More
Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.
△ Less
Submitted 12 July, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
In-context Reinforcement Learning with Algorithm Distillation
Authors:
Michael Laskin,
Luyu Wang,
Junhyuk Oh,
Emilio Parisotto,
Stephen Spencer,
Richie Steigerwald,
DJ Strouse,
Steven Hansen,
Angelos Filos,
Ethan Brooks,
Maxime Gazeau,
Himanshu Sahni,
Satinder Singh,
Volodymyr Mnih
Abstract:
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transf…
▽ More
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.