-
Video as the New Language for Real-World Decision Making
Authors:
Sherry Yang,
Jacob Walker,
Jack Parker-Holder,
Yilun Du,
Jake Bruce,
Andre Barreto,
Pieter Abbeel,
Dale Schuurmans
Abstract:
Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that…
▽ More
Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had significant real-world impact, whereas video generation has remained largely limited to media entertainment. Yet video data captures important information about the physical world that is difficult to express in language. To address this gap, we discuss an under-appreciated opportunity to extend video generation to solve tasks in the real world. We observe how, akin to language, video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning. We identify major impact opportunities in domains such as robotics, self-driving, and science, supported by recent work that demonstrates how such advanced capabilities in video generation are plausibly within reach. Lastly, we identify key challenges in video generation that mitigate progress. Addressing these challenges will enable video generation models to demonstrate unique value alongside language models in a wider array of AI applications.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Genie: Generative Interactive Environments
Authors:
Jake Bruce,
Michael Dennis,
Ashley Edwards,
Jack Parker-Holder,
Yuge Shi,
Edward Hughes,
Matthew Lai,
Aditi Mavalankar,
Richie Steigerwald,
Chris Apps,
Yusuf Aytar,
Sarah Bechtle,
Feryal Behbahani,
Stephanie Chan,
Nicolas Heess,
Lucy Gonzalez,
Simon Osindero,
Sherjil Ozair,
Scott Reed,
Jingwei Zhang,
Konrad Zolna,
Jeff Clune,
Nando de Freitas,
Satinder Singh,
Tim Rocktäschel
Abstract:
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem…
▽ More
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
A Generalist Dynamics Model for Control
Authors:
Ingmar Schubert,
Jingwei Zhang,
Jake Bruce,
Sarah Bechtle,
Emilio Parisotto,
Martin Riedmiller,
Jost Tobias Springenberg,
Arunkumar Byravan,
Leonard Hasenclever,
Nicolas Heess
Abstract:
We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any fu…
▽ More
We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any further training. Here, we demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. Additional results show that TDMs also perform well in a single-environment learning setting when compared to a number of baseline models. These properties make TDMs a promising ingredient for a foundation model of control.
△ Less
Submitted 23 September, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Two-phase Dual COPOD Method for Anomaly Detection in Industrial Control System
Authors:
Emmanuel Aboah Boateng,
Jerry Bruce
Abstract:
Critical infrastructures like water treatment facilities and power plants depend on industrial control systems (ICS) for monitoring and control, making them vulnerable to cyber attacks and system malfunctions. Traditional ICS anomaly detection methods lack transparency and interpretability, which make it difficult for practitioners to understand and trust the results. This paper proposes a two-pha…
▽ More
Critical infrastructures like water treatment facilities and power plants depend on industrial control systems (ICS) for monitoring and control, making them vulnerable to cyber attacks and system malfunctions. Traditional ICS anomaly detection methods lack transparency and interpretability, which make it difficult for practitioners to understand and trust the results. This paper proposes a two-phase dual Copula-based Outlier Detection (COPOD) method that addresses these challenges. The first phase removes unwanted outliers using an empirical cumulative distribution algorithm, and the second phase develops two parallel COPOD models based on the output data of phase 1. The method is based on empirical distribution functions, parameter-free, and provides interpretability by quantifying each feature's contribution to an anomaly. The method is also computationally and memory-efficient, suitable for low- and high-dimensional datasets. Experimental results demonstrate superior performance in terms of F1-score and recall on three open-source ICS datasets, enabling real-time ICS anomaly detection.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Accelerating exploration and representation learning with offline pre-training
Authors:
Bogdan Mazoure,
Jake Bruce,
Doina Precup,
Rob Fergus,
Ankit Anand
Abstract:
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned…
▽ More
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
A Generalist Agent
Authors:
Scott Reed,
Konrad Zolna,
Emilio Parisotto,
Sergio Gomez Colmenarejo,
Alexander Novikov,
Gabriel Barth-Maron,
Mai Gimenez,
Yury Sulsky,
Jackie Kay,
Jost Tobias Springenberg,
Tom Eccles,
Jake Bruce,
Ali Razavi,
Ashley Edwards,
Nicolas Heess,
Yutian Chen,
Raia Hadsell,
Oriol Vinyals,
Mahyar Bordbar,
Nando de Freitas
Abstract:
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, dec…
▽ More
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
△ Less
Submitted 11 November, 2022; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Imitation by Predicting Observations
Authors:
Andrew Jaegle,
Yury Sulsky,
Arun Ahuja,
Jake Bruce,
Rob Fergus,
Greg Wayne
Abstract:
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous con…
▽ More
Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks
Authors:
Vibhavari Dasagi,
Robert Lee,
Jake Bruce,
Jürgen Leitner
Abstract:
Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off…
▽ More
Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off-policy algorithms can in principle learn arbitrary tasks from a diverse enough fixed dataset. In this work, we evaluate popular exploration methods by generating robotics datasets for the purpose of learning to solve tasks completely offline without any further interaction in the real world. We present results on three popular continuous control tasks in simulation, as well as continuous control of a high-dimensional real robot arm. Code documenting all algorithms, experiments, and hyper-parameters is available at https://github.com/qutrobotlearning/batchlearning.
△ Less
Submitted 19 November, 2019;
originally announced November 2019.
-
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Authors:
Vibhavari Dasagi,
Jake Bruce,
Thierry Peynot,
Jürgen Leitner
Abstract:
When learning behavior, training data is often generated by the learner itself; this can result in unstable training dynamics, and this problem has particularly important applications in safety-sensitive real-world control tasks such as robotics. In this work, we propose a principled and model-agnostic approach to mitigate the issue of unstable learning dynamics by maintaining a history of a reinf…
▽ More
When learning behavior, training data is often generated by the learner itself; this can result in unstable training dynamics, and this problem has particularly important applications in safety-sensitive real-world control tasks such as robotics. In this work, we propose a principled and model-agnostic approach to mitigate the issue of unstable learning dynamics by maintaining a history of a reinforcement learning agent over the course of training, and reverting to the parameters of a previous agent whenever performance significantly decreases. We develop techniques for evaluating this performance through statistical hypothesis testing of continued improvement, and evaluate them on a standard suite of challenging benchmark tasks involving continuous control of simulated robots. We show improvements over state-of-the-art reinforcement learning algorithms in performance and robustness to hyperparameters, outperforming DDPG in 5 out of 6 evaluation environments and showing no decrease in performance with TD3, which is known to be relatively stable. In this way, our approach takes an important step towards increasing data efficiency and stability in training for real-world robotic applications.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Sim-to-Real Transfer of Robot Learning with Variable Length Inputs
Authors:
Vibhavari Dasagi,
Robert Lee,
Serena Mou,
Jake Bruce,
Niko Sünderhauf,
Jürgen Leitner
Abstract:
Current end-to-end deep Reinforcement Learning (RL) approaches require jointly learning perception, decision-making and low-level control from very sparse reward signals and high-dimensional inputs, with little capability of incorporating prior knowledge. This results in prohibitively long training times for use on real-world robotic tasks. Existing algorithms capable of extracting task-level repr…
▽ More
Current end-to-end deep Reinforcement Learning (RL) approaches require jointly learning perception, decision-making and low-level control from very sparse reward signals and high-dimensional inputs, with little capability of incorporating prior knowledge. This results in prohibitively long training times for use on real-world robotic tasks. Existing algorithms capable of extracting task-level representations from high-dimensional inputs, e.g. object detection, often produce outputs of varying lengths, restricting their use in RL methods due to the need for neural networks to have fixed length inputs. In this work, we propose a framework that combines deep sets encoding, which allows for variable-length abstract representations, with modular RL that utilizes these representations, decoupling high-level decision making from low-level control. We successfully demonstrate our approach on the robot manipulation task of object sorting, showing that this method can learn effective policies within mere minutes of highly simplified simulation. The learned policies can be directly deployed on a robot without further training, and generalize to variations of the task unseen during training.
△ Less
Submitted 8 October, 2019; v1 submitted 20 September, 2018;
originally announced September 2018.
-
Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal
Authors:
Jake Bruce,
Niko Sünderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford
Abstract:
Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, f…
▽ More
Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.
△ Less
Submitted 11 July, 2018;
originally announced July 2018.
-
One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay
Authors:
Jake Bruce,
Niko Suenderhauf,
Piotr Mirowski,
Raia Hadsell,
Michael Milford
Abstract:
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fi…
▽ More
Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning.
△ Less
Submitted 28 November, 2017; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Authors:
Peter Anderson,
Qi Wu,
Damien Teney,
Jake Bruce,
Mark Johnson,
Niko Sünderhauf,
Ian Reid,
Stephen Gould,
Anton van den Hengel
Abstract:
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a…
▽ More
A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.
△ Less
Submitted 5 April, 2018; v1 submitted 20 November, 2017;
originally announced November 2017.
-
Look No Further: Adapting the Localization Sensory Window to the Temporal Characteristics of the Environment
Authors:
Jake Bruce,
Adam Jacobson,
Michael Milford
Abstract:
Many localization algorithms use a spatiotemporal window of sensory information in order to recognize spatial locations, and the length of this window is often a sensitive parameter that must be tuned to the specifics of the application. This letter presents a general method for environment-driven variation of the length of the spatiotemporal window based on searching for the most significant loca…
▽ More
Many localization algorithms use a spatiotemporal window of sensory information in order to recognize spatial locations, and the length of this window is often a sensitive parameter that must be tuned to the specifics of the application. This letter presents a general method for environment-driven variation of the length of the spatiotemporal window based on searching for the most significant localization hypothesis, to use as much context as is appropriate but not more. We evaluate this approach on benchmark datasets using visual and Wi-Fi sensor modalities and a variety of sensory comparison front-ends under in-order and out-of-order traversals of the environment. Our results show that the system greatly reduces the maximum distance traveled without localization compared to a fixed-length approach while achieving competitive localization accuracy, and our proposed method achieves this performance without deployment-time tuning.
△ Less
Submitted 23 July, 2017; v1 submitted 18 June, 2017;
originally announced June 2017.
-
Deep Reinforcement Learning for Tensegrity Robot Locomotion
Authors:
Marvin Zhang,
Xinyang Geng,
Jonathan Bruce,
Ken Caluwaerts,
Massimo Vespignani,
Vytas SunSpiral,
Pieter Abbeel,
Sergey Levine
Abstract:
Tensegrity robots, composed of rigid rods connected by elastic cables, have a number of unique properties that make them appealing for use as planetary exploration rovers. However, control of tensegrity robots remains a difficult problem due to their unusual structures and complex dynamics. In this work, we show how locomotion gaits can be learned automatically using a novel extension of mirror de…
▽ More
Tensegrity robots, composed of rigid rods connected by elastic cables, have a number of unique properties that make them appealing for use as planetary exploration rovers. However, control of tensegrity robots remains a difficult problem due to their unusual structures and complex dynamics. In this work, we show how locomotion gaits can be learned automatically using a novel extension of mirror descent guided policy search (MDGPS) applied to periodic locomotion movements, and we demonstrate the effectiveness of our approach on tensegrity robot locomotion. We evaluate our method with real-world and simulated experiments on the SUPERball tensegrity robot, showing that the learned policies generalize to changes in system parameters, unreliable sensor measurements, and variation in environmental conditions, including varied terrains and a range of different gravities. Our experiments demonstrate that our method not only learns fast, power-efficient feedback policies for rolling gaits, but that these policies can succeed with only the limited onboard sensing provided by SUPERball's accelerometers. We compare the learned feedback policies to learned open-loop policies and hand-engineered controllers, and demonstrate that the learned policy enables the first continuous, reliable locomotion gait for the real SUPERball robot. Our code and other supplementary materials are available from http://rll.berkeley.edu/drl_tensegrity
△ Less
Submitted 7 March, 2017; v1 submitted 28 September, 2016;
originally announced September 2016.
-
A light-weight, multi-axis compliant tensegrity joint
Authors:
Steven Lessard,
Jonathan Bruce,
Erik Jung,
Mircea Teodorescu,
Vytas SunSpiral,
Adrian Agogino
Abstract:
In this paper, we present a light-weight, multi- axis compliant tenegrity joint that is biologically inspired by the human elbow. This tensegrity elbow actuates by shortening and lengthening cable in a method inspired by muscular actuation in a person. Unlike many series elastic actuators, this joint is structurally compliant not just along each axis of rotation, but along other axes as well. Comp…
▽ More
In this paper, we present a light-weight, multi- axis compliant tenegrity joint that is biologically inspired by the human elbow. This tensegrity elbow actuates by shortening and lengthening cable in a method inspired by muscular actuation in a person. Unlike many series elastic actuators, this joint is structurally compliant not just along each axis of rotation, but along other axes as well. Compliant robotic joints are indispensable in unpredictable environments, including ones where the robot must interface with a person. The joint also addresses the need for functional redundancy and flexibility, traits which are required for many applications that investigate the use of biologically accurate robotic models.
△ Less
Submitted 26 October, 2015;
originally announced October 2015.
-
State Estimation for Tensegrity Robots
Authors:
Ken Caluwaerts,
Jonathan Bruce,
Jeffrey M. Friesen,
Vytas SunSpiral
Abstract:
Tensegrity robots are a class of compliant robots that have many desirable traits when designing mass efficient systems that must interact with uncertain environments. Various promising control approaches have been proposed for tensegrity systems in simulation. Unfortunately, state estimation methods for tensegrity robots have not yet been thoroughly studied.
In this paper, we present the design…
▽ More
Tensegrity robots are a class of compliant robots that have many desirable traits when designing mass efficient systems that must interact with uncertain environments. Various promising control approaches have been proposed for tensegrity systems in simulation. Unfortunately, state estimation methods for tensegrity robots have not yet been thoroughly studied.
In this paper, we present the design and evaluation of a state estimator for tensegrity robots. This state estimator will enable existing and future control algorithms to transfer from simulation to hardware. Our approach is based on the unscented Kalman filter (UKF) and combines inertial measurements, ultra wideband time-of-flight ranging measurements, and actuator state information.
We evaluate the effectiveness of our method on the SUPERball, a tensegrity based planetary exploration robotic prototype. In particular, we conduct tests for evaluating both the robot's success in estimating global position in relation to fixed ranging base stations during rolling maneuvers as well as local behavior due to small-amplitude deformations induced by cable actuation.
△ Less
Submitted 19 February, 2016; v1 submitted 5 October, 2015;
originally announced October 2015.