-
Proving Theorems using Incremental Learning and Hindsight Experience Replay
Authors:
Eser Aygün,
Laurent Orseau,
Ankit Anand,
Xavier Glorot,
Vlad Firoiu,
Lei M. Zhang,
Doina Precup,
Shibl Mourad
Abstract:
Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learnin…
▽ More
Traditional automated theorem provers for first-order logic depend on speed-optimized search and many handcrafted heuristics that are designed to work best over a wide range of domains. Machine learning approaches in literature either depend on these traditional provers to bootstrap themselves or fall short on reaching comparable performance. In this paper, we propose a general incremental learning algorithm for training domain specific provers for first-order logic without equality, based only on a basic given-clause algorithm, but using a learned clause-scoring function. Clauses are represented as graphs and presented to transformer networks with spectral features. To address the sparsity and the initial lack of training data as well as the lack of a natural curriculum, we adapt hindsight experience replay to theorem proving, so as to be able to learn even when no proof can be found. We show that provers trained this way can match and sometimes surpass state-of-the-art traditional provers on the TPTP dataset in terms of both quantity and quality of the proofs.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
The Option Keyboard: Combining Skills in Reinforcement Learning
Authors:
André Barreto,
Diana Borsa,
Shaobo Hou,
Gheorghe Comanici,
Eser Aygün,
Philippe Hamel,
Daniel Toyama,
Jonathan Hunt,
Shibl Mourad,
David Silver,
Doina Precup
Abstract:
The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show…
▽ More
The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Training a First-Order Theorem Prover from Synthetic Data
Authors:
Vlad Firoiu,
Eser Aygun,
Ankit Anand,
Zafarali Ahmed,
Xavier Glorot,
Laurent Orseau,
Lei Zhang,
Doina Precup,
Shibl Mourad
Abstract:
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-…
▽ More
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training purely with synthetically generated theorems, without any human data aside from axioms. We use these theorems to train a neurally-guided saturation-based prover. Our neural prover outperforms the state-of-the-art E-prover on this synthetic data in both time and search steps, and shows significant transfer to the unseen human-written theorems from the TPTP library, where it solves 72\% of first-order problems without equality.
△ Less
Submitted 6 April, 2021; v1 submitted 5 March, 2021;
originally announced March 2021.
-
Learning to Prove from Synthetic Theorems
Authors:
Eser Aygün,
Zafarali Ahmed,
Ankit Anand,
Vlad Firoiu,
Xavier Glorot,
Laurent Orseau,
Doina Precup,
Shibl Mourad
Abstract:
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned pr…
▽ More
A major challenge in applying machine learning to automated theorem proving is the scarcity of training data, which is a key ingredient in training successful deep learning models. To tackle this problem, we propose an approach that relies on training with synthetic theorems, generated from a set of axioms. We show that such theorems can be used to train an automated prover and that the learned prover transfers successfully to human-generated theorems. We demonstrate that a prover trained exclusively on synthetic theorems can solve a substantial fraction of problems in TPTP, a benchmark dataset that is used to compare state-of-the-art heuristic provers. Our approach outperforms a model trained on human-generated problems in most axiom sets, thereby showing the promise of using synthetic data for this task.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Learning to cooperate: Emergent communication in multi-agent navigation
Authors:
Ivana Kajić,
Eser Aygün,
Doina Precup
Abstract:
Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An a…
▽ More
Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An analysis of the agents' policies reveals that emergent signals spatially cluster the state space, with signals referring to specific locations and spatial directions such as "left", "up", or "upper left room". Using populations of agents, we show that the emergent protocol has basic compositional structure, thus exhibiting a core property of natural language.
△ Less
Submitted 30 June, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Spectral renormalization group theory on networks
Authors:
Eser Aygun,
Ayse Erzan
Abstract:
Discrete amorphous materials are best described in terms of arbitrary networks which can be embedded in three dimensional space. Investigating the thermodynamic equilibrium as well as non-equilibrium behavior of such materials around second order phase transitions call for special techniques.
We set up a renormalization group scheme by expanding an arbitrary scalar field living on the nodes of a…
▽ More
Discrete amorphous materials are best described in terms of arbitrary networks which can be embedded in three dimensional space. Investigating the thermodynamic equilibrium as well as non-equilibrium behavior of such materials around second order phase transitions call for special techniques.
We set up a renormalization group scheme by expanding an arbitrary scalar field living on the nodes of an arbitrary network, in terms of the eigenvectors of the normalized graph Laplacian. The renormalization transformation involves, as usual, the integration over the more "rapidly varying" components of the field, corresponding to eigenvectors with larger eigenvalues, and then rescaling. The critical exponents depend on the particular graph through the spectral density of the eigenvalues.
△ Less
Submitted 18 July, 2011;
originally announced July 2011.
-
A Formal Treatment of Generalized Preferential Attachment and its Empirical Validation
Authors:
Amac Herdagdelen,
Eser Aygun,
Haluk Bingol
Abstract:
Generalized preferential attachment is defined as the tendency of a vertex to acquire new links in the future with respect to a particular vertex property. Understanding which properties influence link acquisition tendency (LAT) gives us a predictive power to estimate the future growth of network and insight about the actual dynamics governing the complex networks. In this study, we explore the…
▽ More
Generalized preferential attachment is defined as the tendency of a vertex to acquire new links in the future with respect to a particular vertex property. Understanding which properties influence link acquisition tendency (LAT) gives us a predictive power to estimate the future growth of network and insight about the actual dynamics governing the complex networks. In this study, we explore the effect of age and degree on LAT by analyzing data collected from a new complex-network growth dataset. We found that LAT and degree of a vertex are linearly correlated in accordance with previous studies. Interestingly, the relation between LAT and age of a vertex is found to be in conflict with the known models of network growth. We identified three different periods in the network's lifetime where the relation between age and LAT is strongly positive, almost stationary and negative correspondingly.
△ Less
Submitted 16 July, 2007; v1 submitted 15 September, 2006;
originally announced September 2006.