-
GATS: Gather-Attend-Scatter
Authors:
Konrad Zolna,
Serkan Cabi,
Yutian Chen,
Eric Lau,
Claudio Fantacci,
Jurgis Pasukonis,
Jost Tobias Springenberg,
Sergio Gomez Colmenarejo
Abstract:
As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalit…
▽ More
As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalities at different rates. In contrast to traditional fine-tuning, GATS allows for the original component models to remain frozen, avoiding the risk of them losing important knowledge acquired during the pretraining phase. We demonstrate the utility and versatility of GATS with a few experiments across games, robotics, and multimodal input-output systems.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Vision-Language Models as Success Detectors
Authors:
Yuqing Du,
Ksenia Konyushkova,
Misha Denil,
Akhil Raju,
Jessica Landon,
Felix Hill,
Nando de Freitas,
Serkan Cabi
Abstract:
Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on developing robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success det…
▽ More
Detecting successful behaviour is crucial for training intelligent agents. As such, generalisable reward models are a prerequisite for agents that can learn to generalise their behaviour. In this work we focus on developing robust success detectors that leverage large, pretrained vision-language models (Flamingo, Alayrac et al. (2022)) and human reward annotations. Concretely, we treat success detection as a visual question answering (VQA) problem, denoted SuccessVQA. We study success detection across three vastly different domains: (i) interactive language-conditioned agents in a simulated household, (ii) real world robotic manipulation, and (iii) "in-the-wild" human egocentric videos. We investigate the generalisation properties of a Flamingo-based success detection model across unseen language and visual changes in the first two domains, and find that the proposed method is able to outperform bespoke reward models in out-of-distribution test scenarios with either variation. In the last domain of "in-the-wild" human videos, we show that success detection on unseen real videos presents an even more challenging generalisation task warranting future work. We hope our initial results encourage further work in real world success detection and reward modelling.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Flamingo: a Visual Language Model for Few-Shot Learning
Authors:
Jean-Baptiste Alayrac,
Jeff Donahue,
Pauline Luc,
Antoine Miech,
Iain Barr,
Yana Hasson,
Karel Lenc,
Arthur Mensch,
Katie Millican,
Malcolm Reynolds,
Roman Ring,
Eliza Rutherford,
Serkan Cabi,
Tengda Han,
Zhitao Gong,
Sina Samangooei,
Marianne Monteiro,
Jacob Menick,
Sebastian Borgeaud,
Andrew Brock,
Aida Nematzadeh,
Sahand Sharifzadeh,
Mikolaj Binkowski,
Ricardo Barreira,
Oriol Vinyals
, et al. (2 additional authors not shown)
Abstract:
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i…
▽ More
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.
△ Less
Submitted 15 November, 2022; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Multimodal Few-Shot Learning with Frozen Language Models
Authors:
Maria Tsimpoukelli,
Jacob Menick,
Serkan Cabi,
S. M. Ali Eslami,
Oriol Vinyals,
Felix Hill
Abstract:
When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each im…
▽ More
When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.
△ Less
Submitted 3 July, 2021; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Semi-supervised reward learning for offline reinforcement learning
Authors:
Ksenia Konyushkova,
Konrad Zolna,
Yusuf Aytar,
Alexander Novikov,
Scott Reed,
Serkan Cabi,
Nando de Freitas
Abstract:
In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engi…
▽ More
In offline reinforcement learning (RL) agents are trained using a logged dataset. It appears to be the most natural route to attack real-life applications because in domains such as healthcare and robotics interactions with the environment are either expensive or unethical. Training agents usually requires reward functions, but unfortunately, rewards are seldom available in practice and their engineering is challenging and laborious. To overcome this, we investigate reward learning under the constraint of minimizing human reward annotations. We consider two types of supervision: timestep annotations and demonstrations. We propose semi-supervised learning algorithms that learn from limited annotations and incorporate unlabelled data. In our experiments with a simulated robotic arm, we greatly improve upon behavioural cloning and closely approach the performance achieved with ground truth rewards. We further investigate the relationship between the quality of the reward model and the final policies. We notice, for example, that the reward models do not need to be perfect to result in useful policies.
△ Less
Submitted 12 December, 2020;
originally announced December 2020.
-
Acme: A Research Framework for Distributed Reinforcement Learning
Authors:
Matthew W. Hoffman,
Bobak Shahriari,
John Aslanides,
Gabriel Barth-Maron,
Nikola Momchev,
Danila Sinopalnikov,
Piotr Stańczyk,
Sabela Ramos,
Anton Raichuk,
Damien Vincent,
Léonard Hussenot,
Robert Dadashi,
Gabriel Dulac-Arnold,
Manu Orsini,
Alexis Jacq,
Johan Ferret,
Nino Vieillard,
Seyed Kamyar Seyed Ghasemipour,
Sertan Girgin,
Olivier Pietquin,
Feryal Behbahani,
Tamara Norman,
Abbas Abdolmaleki,
Albin Cassirer,
Fan Yang
, et al. (14 additional authors not shown)
Abstract:
Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce publishe…
▽ More
Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation.
This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme.
△ Less
Submitted 20 September, 2022; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Task-Relevant Adversarial Imitation Learning
Authors:
Konrad Zolna,
Scott Reed,
Alexander Novikov,
Sergio Gomez Colmenarejo,
David Budden,
Serkan Cabi,
Misha Denil,
Nando de Freitas,
Ziyu Wang
Abstract:
We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms sta…
▽ More
We show that a critical vulnerability in adversarial imitation is the tendency of discriminator networks to learn spurious associations between visual features and expert labels. When the discriminator focuses on task-irrelevant features, it does not provide an informative reward signal, leading to poor task performance. We analyze this problem in detail and propose a solution that outperforms standard Generative Adversarial Imitation Learning (GAIL). Our proposed method, Task-Relevant Adversarial Imitation Learning (TRAIL), uses constrained discriminator optimization to learn informative rewards. In comprehensive experiments, we show that TRAIL can solve challenging robotic manipulation tasks from pixels by imitating human operators without access to any task rewards, and clearly outperforms comparable baseline imitation agents, including those trained via behaviour cloning and conventional GAIL.
△ Less
Submitted 12 November, 2020; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Authors:
Serkan Cabi,
Sergio Gómez Colmenarejo,
Alexander Novikov,
Ksenia Konyushkova,
Scott Reed,
Rae Jeong,
Konrad Zolna,
Yusuf Aytar,
David Budden,
Mel Vecerik,
Oleg Sushkov,
David Barker,
Jonathan Scholz,
Misha Denil,
Nando de Freitas,
Ziyu Wang
Abstract:
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human…
▽ More
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.
△ Less
Submitted 4 June, 2020; v1 submitted 26 September, 2019;
originally announced September 2019.
-
One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL
Authors:
Tom Le Paine,
Sergio Gómez Colmenarejo,
Ziyu Wang,
Scott Reed,
Yusuf Aytar,
Tobias Pfaff,
Matt W. Hoffman,
Gabriel Barth-Maron,
Serkan Cabi,
David Budden,
Nando de Freitas
Abstract:
Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for…
▽ More
Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Learning Awareness Models
Authors:
Brandon Amos,
Laurent Dinh,
Serkan Cabi,
Thomas Rothörl,
Sergio Gómez Colmenarejo,
Alistair Muldal,
Tom Erez,
Yuval Tassa,
Nando de Freitas,
Misha Denil
Abstract:
We consider the setting of an agent with a fixed body interacting with an unknown and uncertain external world. We show that models trained to predict proprioceptive information about the agent's body come to represent objects in the external world. In spite of being trained with only internally available signals, these dynamic body models come to represent external objects through the necessity o…
▽ More
We consider the setting of an agent with a fixed body interacting with an unknown and uncertain external world. We show that models trained to predict proprioceptive information about the agent's body come to represent objects in the external world. In spite of being trained with only internally available signals, these dynamic body models come to represent external objects through the necessity of predicting their effects on the agent's own body. That is, the model learns holistic persistent representations of objects in the world, even though the only training signals are body signals. Our dynamics model is able to successfully predict distributions over 132 sensor readings over 100 steps into the future and we demonstrate that even when the body is no longer in contact with an object, the latent variables of the dynamics model continue to represent its shape. We show that active data collection by maximizing the entropy of predictions about the body---touch sensors, proprioception and vestibular information---leads to learning of dynamic models that show superior performance when used for control. We also collect data from a real robotic hand and show that the same models can be used to answer questions about properties of objects in the real world. Videos with qualitative results of our models are available at https://goo.gl/mZuqAV.
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
Reinforcement and Imitation Learning for Diverse Visuomotor Skills
Authors:
Yuke Zhu,
Ziyu Wang,
Josh Merel,
Andrei Rusu,
Tom Erez,
Serkan Cabi,
Saran Tunyasuvunakool,
János Kramár,
Raia Hadsell,
Nando de Freitas,
Nicolas Heess
Abstract:
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which en…
▽ More
We propose a model-free deep reinforcement learning method that leverages a small amount of demonstration data to assist a reinforcement learning agent. We apply this approach to robotic manipulation tasks and train end-to-end visuomotor policies that map directly from RGB camera inputs to joint velocities. We demonstrate that our approach can solve a wide variety of visuomotor tasks, for which engineering a scripted controller would be laborious. In experiments, our reinforcement and imitation agent achieves significantly better performances than agents trained with reinforcement learning or imitation learning alone. We also illustrate that these policies, trained with large visual and dynamics variations, can achieve preliminary successes in zero-shot sim2real transfer. A brief visual description of this work can be viewed in https://youtu.be/EDl8SQUNjj0
△ Less
Submitted 27 May, 2018; v1 submitted 26 February, 2018;
originally announced February 2018.
-
The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously
Authors:
Serkan Cabi,
Sergio Gómez Colmenarejo,
Matthew W. Hoffman,
Misha Denil,
Ziyu Wang,
Nando de Freitas
Abstract:
This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible r…
▽ More
This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.
△ Less
Submitted 11 July, 2017;
originally announced July 2017.
-
Programmable Agents
Authors:
Misha Denil,
Sergio Gómez Colmenarejo,
Serkan Cabi,
David Saxton,
Nando de Freitas
Abstract:
We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide…
▽ More
We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Constraining Torsion with Gravity Probe B
Authors:
Yi Mao,
Max Tegmark,
Alan Guth,
Serkan Cabi
Abstract:
It is well-entrenched folklore that torsion gravity theories predict observationally negligible torsion in the solar system, since torsion (if it exists) couples only to the intrinsic spin of elementary particles, not to rotational angular momentum. We argue that this assumption has a logical loophole which can and should be tested experimentally. In the spirit of action=reaction, if a rotating…
▽ More
It is well-entrenched folklore that torsion gravity theories predict observationally negligible torsion in the solar system, since torsion (if it exists) couples only to the intrinsic spin of elementary particles, not to rotational angular momentum. We argue that this assumption has a logical loophole which can and should be tested experimentally. In the spirit of action=reaction, if a rotating mass like a planet can generate torsion, then a gyroscope should also feel torsion. Using symmetry arguments, we show that to lowest order, the torsion field around a uniformly rotating spherical mass is determined by seven dimensionless parameters. These parameters effectively generalize the PPN formalism and provide a concrete framework for further testing GR. We construct a parametrized Lagrangian that includes both standard torsion-free GR and Hayashi- Shirafuji maximal torsion gravity as special cases. We demonstrate that classic solar system tests rule out the latter and constrain two observable parameters. We show that Gravity Probe B (GPB) is an ideal experiment for further constraining torsion theories, and work out the most general torsion-induced precession of its gyroscope in terms of our torsion parameters
△ Less
Submitted 5 October, 2007; v1 submitted 29 August, 2006;
originally announced August 2006.