-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (172 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 August, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Half-Hop: A graph upsampling approach for slowing down message passing
Authors:
Mehdi Azabou,
Venkataramana Ganesh,
Shantanu Thakoor,
Chi-Heng Lin,
Lakshmi Sathidevi,
Ran Liu,
Michal Valko,
Petar Veličković,
Eva L. Dyer
Abstract:
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the origin…
▽ More
Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition
Authors:
Yash Chandak,
Shantanu Thakoor,
Zhaohan Daniel Guo,
Yunhao Tang,
Remi Munos,
Will Dabney,
Diana L Borsa
Abstract:
Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitati…
▽ More
Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, and rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.
△ Less
Submitted 2 May, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Relax, it doesn't matter how you get there: A new self-supervised approach for multi-timescale behavior analysis
Authors:
Mehdi Azabou,
Michael Mendelson,
Nauman Ahad,
Maks Sorokin,
Shantanu Thakoor,
Carolina Urzay,
Eva L. Dyer
Abstract:
Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this wor…
▽ More
Natural behavior consists of dynamics that are complex and unpredictable, especially when trying to predict many steps into the future. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings where behavior becomes increasingly hard to model. In this work, we develop a multi-task representation learning model for behavior that combines two novel components: (i) An action prediction objective that aims to predict the distribution of actions over future timesteps, and (ii) A multi-scale architecture that builds separate latent spaces to accommodate short- and long-term dynamics. After demonstrating the ability of the method to build representations of both local and global dynamics in realistic robots in varying environments and terrains, we apply our method to the MABe 2022 Multi-agent behavior challenge, where our model ranks 1st overall and on all global tasks, and 1st or 2nd on 7 out of 9 frame-level tasks. In all of these cases, we show that our model can build representations that capture the many different factors that drive behavior and solve a wide range of downstream tasks.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Understanding Self-Predictive Learning for Reinforcement Learning
Authors:
Yunhao Tang,
Zhaohan Daniel Guo,
Pierre Harvey Richemond,
Bernardo Ávila Pires,
Yash Chandak,
Rémi Munos,
Mark Rowland,
Mohammad Gheshlaghi Azar,
Charline Le Lan,
Clare Lyle,
András György,
Shantanu Thakoor,
Will Dabney,
Bilal Piot,
Daniele Calandriello,
Michal Valko
Abstract:
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirabl…
▽ More
We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Generalised Policy Improvement with Geometric Policy Composition
Authors:
Shantanu Thakoor,
Mark Rowland,
Diana Borsa,
Will Dabney,
Rémi Munos,
André Barreto
Abstract:
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate…
▽ More
We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL. The new method builds on the concept of a geometric horizon model (GHM, also known as a gamma-model), which models the discounted state-visitation distribution of a given policy. We show that we can evaluate any non-Markov policy that switches between a set of base Markov policies with fixed probability by a careful composition of the base policy GHMs, without any additional learning. We can then apply generalised policy improvement (GPI) to collections of such non-Markov policies to obtain a new Markov policy that will in general outperform its precursors. We provide a thorough theoretical analysis of this approach, develop applications to transfer and standard RL, and empirically demonstrate its effectiveness over standard GPI on a challenging deep RL continuous control task. We also provide an analysis of GHM training methods, proving a novel convergence result regarding previously proposed methods and showing how to train these models stably in deep RL settings.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
BYOL-Explore: Exploration by Bootstrapped Prediction
Authors:
Zhaohan Daniel Guo,
Shantanu Thakoor,
Miruna Pîslar,
Bernardo Avila Pires,
Florent Altché,
Corentin Tallec,
Alaa Saade,
Daniele Calandriello,
Jean-Bastien Grill,
Yunhao Tang,
Michal Valko,
Rémi Munos,
Mohammad Gheshlaghi Azar,
Bilal Piot
Abstract:
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challeng…
▽ More
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy all-together by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments. On this benchmark, we solve the majority of the tasks purely through augmenting the extrinsic reward with BYOL-Explore s intrinsic reward, whereas prior work could only get off the ground with human demonstrations. As further evidence of the generality of BYOL-Explore, we show that it achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Learning Behavior Representations Through Multi-Timescale Bootstrapping
Authors:
Mehdi Azabou,
Michael Mendelson,
Maks Sorokin,
Shantanu Thakoor,
Nauman Ahad,
Carolina Urzay,
Eva L. Dyer
Abstract:
Natural behavior consists of dynamics that are both unpredictable, can switch suddenly, and unfold over many different timescales. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings due to the fact that they assume a single scale of temporal dynamic…
▽ More
Natural behavior consists of dynamics that are both unpredictable, can switch suddenly, and unfold over many different timescales. While some success has been found in building representations of behavior under constrained or simplified task-based conditions, many of these models cannot be applied to free and naturalistic settings due to the fact that they assume a single scale of temporal dynamics. In this work, we introduce Bootstrap Across Multiple Scales (BAMS), a multi-scale representation learning model for behavior: we combine a pooling module that aggregates features extracted over encoders with different temporal receptive fields, and design a set of latent objectives to bootstrap the representations in each respective space to encourage disentanglement across different timescales. We first apply our method on a dataset of quadrupeds navigating in different terrain types, and show that our model captures the temporal complexity of behavior. We then apply our method to the MABe 2022 Multi-agent behavior challenge, where our model ranks 3rd overall and 1st on two subtasks, and show the importance of incorporating multi-timescales when analyzing behavior.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
A Quasi-Uniform Approach to Characterizing the Boundary of the Almost Entropic Region
Authors:
Satyajit Thakor,
Dauood Saleem
Abstract:
The convex closure of entropy vectors for quasi-uniform random vectors is the same as the closure of the entropy region. Thus, quasi-uniform random vectors constitute an important class of random vectors for characterizing the entropy region. Moreover, the one-to-one correspondence between quasi-uniform codes and quasi-uniform random vectors makes quasi-uniform random vectors of central importance…
▽ More
The convex closure of entropy vectors for quasi-uniform random vectors is the same as the closure of the entropy region. Thus, quasi-uniform random vectors constitute an important class of random vectors for characterizing the entropy region. Moreover, the one-to-one correspondence between quasi-uniform codes and quasi-uniform random vectors makes quasi-uniform random vectors of central importance for designing effective codes for communication systems. In this paper, we present a novel approach that utilizes quasi-uniform random vectors for characterizing the boundary of the almost entropic region. In particular, we use the notion of quasi-uniform random vectors to establish looseness of known inner bounds for the entropy vectors at the boundary of the almost entropic region for three random variables. For communication models such as network coding, our approach can be applied to design network codes from quasi-uniform entropy vectors.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Quantifying and Understanding Adversarial Examples in Discrete Input Spaces
Authors:
Volodymyr Kuleshov,
Evgenii Nikishin,
Shantanu Thakoor,
Tingfung Lau,
Stefano Ermon
Abstract:
Modern classification algorithms are susceptible to adversarial examples--perturbations to inputs that cause the algorithm to produce undesirable behavior. In this work, we seek to understand and extend adversarial examples across domains in which inputs are discrete, particularly across new domains, such as computational biology. As a step towards this goal, we formalize a notion of synonymous ad…
▽ More
Modern classification algorithms are susceptible to adversarial examples--perturbations to inputs that cause the algorithm to produce undesirable behavior. In this work, we seek to understand and extend adversarial examples across domains in which inputs are discrete, particularly across new domains, such as computational biology. As a step towards this goal, we formalize a notion of synonymous adversarial examples that applies in any discrete setting and describe a simple domain-agnostic algorithm to construct such examples. We apply this algorithm across multiple domains--including sentiment analysis and DNA sequence classification--and find that it consistently uncovers adversarial examples. We seek to understand their prevalence theoretically and we attribute their existence to spurious token correlations, a statistical phenomenon that is specific to discrete spaces. Our work is a step towards a domain-agnostic treatment of discrete adversarial examples analogous to that of continuous inputs.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Large-scale graph representation learning with very deep GNNs and self-supervision
Authors:
Ravichandra Addanki,
Peter W. Battaglia,
David Budden,
Andreea Deac,
Jonathan Godwin,
Thomas Keck,
Wai Lok Sibon Li,
Alvaro Sanchez-Gonzalez,
Jacklynn Stott,
Shantanu Thakoor,
Petar Veličković
Abstract:
Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes -- a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC wi…
▽ More
Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes -- a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC with two large-scale GNNs: a deep transductive node classifier powered by bootstrapping, and a very deep (up to 50-layer) inductive graph regressor regularised by denoising objectives. Our models achieved an award-level (top-3) performance on both the MAG240M and PCQM4M benchmarks. In doing so, we demonstrate evidence of scalable self-supervised graph representation learning, and utility of very deep GNNs -- both very important open issues. Our code is publicly available at: https://github.com/deepmind/deepmind-research/tree/master/ogb_lsc.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
Large-Scale Representation Learning on Graphs via Bootstrapping
Authors:
Shantanu Thakoor,
Corentin Tallec,
Mohammad Gheshlaghi Azar,
Mehdi Azabou,
Eva L. Dyer,
Rémi Munos,
Petar Veličković,
Michal Valko
Abstract:
Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootst…
▽ More
Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input. BGRL uses only simple augmentations and alleviates the need for contrasting with negative examples, and is thus scalable by design. BGRL outperforms or matches prior methods on several established benchmarks, while achieving a 2-10x reduction in memory costs. Furthermore, we show that BGRL can be scaled up to extremely large graphs with hundreds of millions of nodes in the semi-supervised regime - achieving state-of-the-art performance and improving over supervised baselines where representations are shaped only through label information. In particular, our solution centered on BGRL constituted one of the winning entries to the Open Graph Benchmark - Large Scale Challenge at KDD Cup 2021, on a graph orders of magnitudes larger than all previously available benchmarks, thus demonstrating the scalability and effectiveness of our approach.
△ Less
Submitted 20 February, 2023; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Geometric Entropic Exploration
Authors:
Zhaohan Daniel Guo,
Mohammad Gheshlaghi Azar,
Alaa Saade,
Shantanu Thakoor,
Bilal Piot,
Bernardo Avila Pires,
Michal Valko,
Thomas Mesnard,
Tor Lattimore,
Rémi Munos
Abstract:
Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, exis…
▽ More
Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a well-defined policy optimization problem whose solution aims at visiting all states as uniformly as possible. This is in contrast to standard uncertainty-based approaches where exploration is transient and eventually vanishes. However, existing approaches to MSVE are theoretically justified only for discrete state-spaces as they are oblivious to the geometry of continuous domains. We address this challenge by introducing Geometric Entropy Maximisation (GEM), a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains. Our key theoretical contribution is casting geometry-aware MSVE exploration as a tractable problem of optimising a simple and novel noise-contrastive objective function. In our experiments, we show the efficiency of GEM in solving several RL problems with sparse rewards, compared against other deep RL exploration approaches.
△ Less
Submitted 7 January, 2021; v1 submitted 6 January, 2021;
originally announced January 2021.
-
Counterfactual Credit Assignment in Model-Free Reinforcement Learning
Authors:
Thomas Mesnard,
Théophane Weber,
Fabio Viola,
Shantanu Thakoor,
Alaa Saade,
Anna Harutyunyan,
Will Dabney,
Tom Stepleton,
Nicolas Heess,
Arthur Guez,
Éric Moulines,
Marcus Hutter,
Lars Buesing,
Rémi Munos
Abstract:
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to…
▽ More
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on future events, by learning to extract relevant information from a trajectory. We formulate a family of policy gradient algorithms that use these future-conditional value functions as baselines or critics, and show that they are provably low variance. To avoid the potential bias from conditioning on future information, we constrain the hindsight information to not contain information about the agent's actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative and challenging problems.
△ Less
Submitted 14 December, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
On the Partition Bound for Undirected Unicast Network Information Capacity
Authors:
Mohammad Ishtiyaq Qureshi,
Satyajit Thakor
Abstract:
One of the important unsolved problems in information theory is the conjecture that network coding has no rate benefit over routing in undirected unicast networks. Three known bounds on the symmetric rate in undirected unicast information networks are the sparsest cut, the LP bound and the partition bound. In this paper, we present three results on the partition bound. We show that the decision ve…
▽ More
One of the important unsolved problems in information theory is the conjecture that network coding has no rate benefit over routing in undirected unicast networks. Three known bounds on the symmetric rate in undirected unicast information networks are the sparsest cut, the LP bound and the partition bound. In this paper, we present three results on the partition bound. We show that the decision version problem of computing the partition bound is NP-complete. We give complete proofs of optimal routing schemes for two classes of networks that attain the partition bound. Recently, the conjecture was proved for a new class of networks and it was shown that all the network instances for which the conjecture is proved previously are elements of this class. We show the existence of a network for which the partition bound is tight, achievable by routing and is not an element of this new class of networks.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
Undirected Unicast Network Capacity: A Partition Bound
Authors:
Satyajit Thakor,
Mohammad Ishtiyaq Qureshi
Abstract:
In this paper, we present a new technique to obtain upper bounds on undirected unicast network information capacity. Using this technique, we characterize an upper bound, called partition bound, on the symmetric rate of information flow in undirected unicast networks and give an algorithm to compute it. Two classes of networks are presented for which the bound is tight and the capacity is achievab…
▽ More
In this paper, we present a new technique to obtain upper bounds on undirected unicast network information capacity. Using this technique, we characterize an upper bound, called partition bound, on the symmetric rate of information flow in undirected unicast networks and give an algorithm to compute it. Two classes of networks are presented for which the bound is tight and the capacity is achievable by routing thus confirming the undirected unicast conjecture for these classes of networks. We also show that the bound can be loose in general and present an approach to tighten it.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
On Characterization of Entropic Vectors at the Boundary of Almost Entropic Cones
Authors:
Hitika Tiwari,
Satyajit Thakor
Abstract:
The entropy region is a fundamental object in information theory. An outer bound for the entropy region is defined by a minimal set of Shannon-type inequalities called elemental inequalities also referred to as the Shannon region. This paper focuses on characterization of the entropic points at the boundary of the Shannon region for three random variables. The proper faces of the Shannon region fo…
▽ More
The entropy region is a fundamental object in information theory. An outer bound for the entropy region is defined by a minimal set of Shannon-type inequalities called elemental inequalities also referred to as the Shannon region. This paper focuses on characterization of the entropic points at the boundary of the Shannon region for three random variables. The proper faces of the Shannon region form its boundary. We give new outer bounds for the entropy region in certain faces and show by explicit construction of distributions that the existing inner bounds for the entropy region in certain faces are not tight.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
On Enumerating Distributions for Associated Vectors in the Entropy Space
Authors:
Sultan Alam,
Satyajit Thakor,
Syed Abbas
Abstract:
This paper focuses on the problem of finding a distribution for an associated entropic vector in the entropy space nearest to a given, possibly non-entropic, target vector for random variables with a constraint on alphabet size. We show the feasibility to find distribution for associated vector via a sequence of perturbations in the probability mass function. Then we present an algorithm for numer…
▽ More
This paper focuses on the problem of finding a distribution for an associated entropic vector in the entropy space nearest to a given, possibly non-entropic, target vector for random variables with a constraint on alphabet size. We show the feasibility to find distribution for associated vector via a sequence of perturbations in the probability mass function. Then we present an algorithm for numerically solving the problem together with extensions, applications, and comparison with the known results.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
A Minimal Set of Shannon-type Inequalities for Functional Dependence Structures
Authors:
Satyajit Thakor,
Terence Chan,
Alex Grant
Abstract:
The minimal set of Shannon-type inequalities (referred to as elemental inequalities), plays a central role in determining whether a given inequality is Shannon-type. Often, there arises a situation where one needs to check whether a given inequality is a constrained Shannon-type inequality. Another important application of elemental inequalities is to formulate and compute the Shannon outer bound…
▽ More
The minimal set of Shannon-type inequalities (referred to as elemental inequalities), plays a central role in determining whether a given inequality is Shannon-type. Often, there arises a situation where one needs to check whether a given inequality is a constrained Shannon-type inequality. Another important application of elemental inequalities is to formulate and compute the Shannon outer bound for multi-source multi-sink network coding capacity. Under this formulation, it is the region of feasible source rates subject to the elemental inequalities and network coding constraints that is of interest. Hence it is of fundamental interest to identify the redundancies induced amongst elemental inequalities when given a set of functional dependence constraints. In this paper, we characterize a minimal set of Shannon-type inequalities when functional dependence constraints are present.
△ Less
Submitted 12 June, 2017;
originally announced June 2017.
-
Capacity Bounds for Networks with Correlated Sources and Characterisation of Distributions by Entropies
Authors:
Satyajit Thakor,
Terence Chan,
Alex Grant
Abstract:
Characterising the capacity region for a network can be extremely difficult. Even with independent sources, determining the capacity region can be as hard as the open problem of characterising all information inequalities. The majority of computable outer bounds in the literature are relaxations of the Linear Programming bound which involves entropy functions of random variables related to the sou…
▽ More
Characterising the capacity region for a network can be extremely difficult. Even with independent sources, determining the capacity region can be as hard as the open problem of characterising all information inequalities. The majority of computable outer bounds in the literature are relaxations of the Linear Programming bound which involves entropy functions of random variables related to the sources and link messages. When sources are not independent, the problem is even more complicated. Extension of Linear Programming bounds to networks with correlated sources is largely open. Source dependence is usually specified via a joint probability distribution, and one of the main challenges in extending linear program bounds is the difficulty (or impossibility) of characterising arbitrary dependencies via entropy functions. This paper tackles the problem by answering the question of how well entropy functions can characterise correlation among sources. We show that by using carefully chosen auxiliary random variables, the characterisation can be fairly "accurate" Using such auxiliary random variables we also give implicit and explicit outer bounds on the capacity of networks with correlated sources. The characterisation of correlation or joint distribution via Shannon entropy functions is also applicable to other information measures such as Renyi entropy and Tsallis entropy.
△ Less
Submitted 11 July, 2016;
originally announced July 2016.
-
Characterising Probability Distributions via Entropies
Authors:
Satyajit Thakor,
Terence Chan,
Alex Grant
Abstract:
Characterising the capacity region for a network can be extremely difficult, especially when the sources are dependent. Most existing computable outer bounds are relaxations of the Linear Programming bound. One main challenge to extend linear program bounds to the case of correlated sources is the difficulty (or impossibility) of characterising arbitrary dependencies via entropy functions. This pa…
▽ More
Characterising the capacity region for a network can be extremely difficult, especially when the sources are dependent. Most existing computable outer bounds are relaxations of the Linear Programming bound. One main challenge to extend linear program bounds to the case of correlated sources is the difficulty (or impossibility) of characterising arbitrary dependencies via entropy functions. This paper tackles the problem by addressing how to use entropy functions to characterise correlation among sources.
△ Less
Submitted 11 February, 2016; v1 submitted 11 February, 2016;
originally announced February 2016.
-
Upper Bounds on the Capacity of 2-Layer $N$-Relay Symmetric Gaussian Network
Authors:
Satyajit Thakor,
Syed Abbas
Abstract:
The Gaussian parallel relay network, in which two parallel relays assist a source to convey information to a destination, was introduced by Schein and Gallager. An upper bound on the capacity can be obtained by considering broadcast cut between the source and relays and multiple access cut between relays and the destination. Niesen and Diggavi derived an upper bound for Gaussian parallel $N$-relay…
▽ More
The Gaussian parallel relay network, in which two parallel relays assist a source to convey information to a destination, was introduced by Schein and Gallager. An upper bound on the capacity can be obtained by considering broadcast cut between the source and relays and multiple access cut between relays and the destination. Niesen and Diggavi derived an upper bound for Gaussian parallel $N$-relay network by considering all other possible cuts and showed an achievability scheme that can attain rates close to the upper bound in different channel gain regimes thus establishing approximate capacity. In this paper we consider symmetric layered Gaussian relay networks in which there can be many layers of parallel relays. The channel gains for the channels between two adjacent layers are symmetrical (identical). Relays in each layer broadcast information to the relays in the next layer. For 2-layer $N$-relay Gaussian network we give upper bounds on the capacity. Our analysis reveals that for the upper bounds, joint optimization over correlation coefficients is not necessary for obtaining stronger results.
△ Less
Submitted 12 April, 2016; v1 submitted 25 January, 2016;
originally announced January 2016.
-
On the Capacity of Networks with Correlated Sources
Authors:
Satyajit Thakor,
Terence Chan,
Alex Grant
Abstract:
Characterizing the capacity region for a network can be extremely difficult. Even with independent sources, determining the capacity region can be as hard as the open problem of characterizing all information inequalities. The majority of computable outer bounds in the literature are relaxations of the Linear Programming bound which involves entropy functions of random variables related to the sou…
▽ More
Characterizing the capacity region for a network can be extremely difficult. Even with independent sources, determining the capacity region can be as hard as the open problem of characterizing all information inequalities. The majority of computable outer bounds in the literature are relaxations of the Linear Programming bound which involves entropy functions of random variables related to the sources and link messages. When sources are not independent, the problem is even more complicated. Extension of linear programming bounds to networks with correlated sources is largely open. Source dependence is usually specified via a joint probability distribution, and one of the main challenges in extending linear programming bounds is the difficulty (or impossibility) of characterizing arbitrary dependencies via entropy functions. This paper tackles the problem by answering the question of how well entropy functions can characterize correlation among sources. We show that by using carefully chosen auxiliary random variables, the characterization can be fairly "accurate".
△ Less
Submitted 5 September, 2013;
originally announced September 2013.
-
Cut-Set Bounds on Network Information Flow
Authors:
Satyajit Thakor,
Alex Grant,
Terence Chan
Abstract:
Explicit characterization of the capacity region of communication networks is a long standing problem. While it is known that network coding can outperform routing and replication, the set of feasible rates is not known in general. Characterizing the network coding capacity region requires determination of the set of all entropic vectors. Furthermore, computing the explicitly known linear programm…
▽ More
Explicit characterization of the capacity region of communication networks is a long standing problem. While it is known that network coding can outperform routing and replication, the set of feasible rates is not known in general. Characterizing the network coding capacity region requires determination of the set of all entropic vectors. Furthermore, computing the explicitly known linear programming bound is infeasible in practice due to an exponential growth in complexity as a function of network size. This paper focuses on the fundamental problems of characterization and computation of outer bounds for networks with correlated sources. Starting from the known local functional dependencies induced by the communications network, we introduce the notion of irreducible sets, which characterize implied functional dependencies. We provide recursions for computation of all maximal irreducible sets. These sets act as information-theoretic bottlenecks, and provide an easily computable outer bound. We extend the notion of irreducible sets (and resulting outer bound) for networks with independent sources. We compare our bounds with existing bounds in the literature. We find that our new bounds are the best among the known graph theoretic bounds for networks with correlated sources and for networks with independent sources.
△ Less
Submitted 10 February, 2016; v1 submitted 16 May, 2013;
originally announced May 2013.
-
Symmetry in Distributed Storage Systems
Authors:
Satyajit Thakor,
Terence Chan,
Kenneth W. Shum
Abstract:
The max-flow outer bound is achievable by regenerating codes for functional repair distributed storage system. However, the capacity of exact repair distributed storage system is an open problem. In this paper, the linear programming bound for exact repair distributed storage systems is formulated. A notion of symmetrical sets for a set of random variables is given and equalities of joint entropie…
▽ More
The max-flow outer bound is achievable by regenerating codes for functional repair distributed storage system. However, the capacity of exact repair distributed storage system is an open problem. In this paper, the linear programming bound for exact repair distributed storage systems is formulated. A notion of symmetrical sets for a set of random variables is given and equalities of joint entropies for certain subsets of random variables in a symmetrical set is established. Concatenation coding scheme for exact repair distributed storage systems is proposed and it is shown that concatenation coding scheme is sufficient to achieve any admissible rate for any exact repair distributed storage system. Equalities of certain joint entropies of random variables induced by concatenation scheme is shown. These equalities of joint entropies are new tools to simplify the linear programming bound and to obtain stronger converse results for exact repair distributed storage systems.
△ Less
Submitted 15 May, 2013;
originally announced May 2013.
-
Reduced Functional Dependence Graph and Its Applications
Authors:
Xiaoli Xu,
Satyajit Thakor,
Yong Liang Guan
Abstract:
Functional dependence graph (FDG) is an important class of directed graph that captures the dominance relationship among a set of variables. FDG is frequently used in calculating network coding capacity bounds. However, the order of FDG is usually much larger than the original network and the computational complexity of many bounds grows exponentially with the order of FDG. In this paper, we intro…
▽ More
Functional dependence graph (FDG) is an important class of directed graph that captures the dominance relationship among a set of variables. FDG is frequently used in calculating network coding capacity bounds. However, the order of FDG is usually much larger than the original network and the computational complexity of many bounds grows exponentially with the order of FDG. In this paper, we introduce the concept of reduced FDG, which is obtained from the original FDG by keeping only those "essential" edges. It is proved that the reduced FDG gives the same capacity region/bounds with the original FDG, but requiring much less computation. The applications of reduced FDG in the algebraic formulation of scalar linear network coding is also discussed.
△ Less
Submitted 9 March, 2012; v1 submitted 10 January, 2012;
originally announced January 2012.
-
Network Coding Capacity: A Functional Dependence Bound
Authors:
Satyajit Thakor,
Alex Grant,
Terence Chan
Abstract:
Explicit characterization and computation of the multi-source network coding capacity region (or even bounds) is long standing open problem. In fact, finding the capacity region requires determination of the set of all entropic vectors $Γ^{*}$, which is known to be an extremely hard problem. On the other hand, calculating the explicitly known linear programming bound is very hard in practice due…
▽ More
Explicit characterization and computation of the multi-source network coding capacity region (or even bounds) is long standing open problem. In fact, finding the capacity region requires determination of the set of all entropic vectors $Γ^{*}$, which is known to be an extremely hard problem. On the other hand, calculating the explicitly known linear programming bound is very hard in practice due to an exponential growth in complexity as a function of network size. We give a new, easily computable outer bound, based on characterization of all functional dependencies in networks. We also show that the proposed bound is tighter than some known bounds.
△ Less
Submitted 29 January, 2009;
originally announced January 2009.