-
Polynomial Lawvere Logic
Authors:
Giorgio Bacci,
Radu Mardare,
Prakash Panangaden,
Gordon Plotkin
Abstract:
In this paper, we study Polynomial Lawvere logic (PL), a logic on the quantale of the extended positive reals, developed for reasoning about metric spaces. PL is appropriate for encoding quantitative reasoning principles, such as quantitative equational logic. PL formulas include the polynomial functions on the extended positive reals, and its judgements include inequalities between polynomials.…
▽ More
In this paper, we study Polynomial Lawvere logic (PL), a logic on the quantale of the extended positive reals, developed for reasoning about metric spaces. PL is appropriate for encoding quantitative reasoning principles, such as quantitative equational logic. PL formulas include the polynomial functions on the extended positive reals, and its judgements include inequalities between polynomials.
We present an inference system for PL and prove a series of completeness and incompleteness results relying and the Krivine-Stengle Positivstellensatz (a variant of Hilbert's Nullstellensatz) including completeness for finitely axiomatisable PL theories.
We also study complexity results both for both PL and its affine fragment (AL). We demonstrate that the satisfiability of a finite set of judgements is NP-complete in AL and in PSPACE for PL; and that deciding the semantical consequence from a finite set of judgements is co-NP complete in AL and in PSPACE in PL.
△ Less
Submitted 7 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Behavioural pseudometrics for continuous-time diffusions
Authors:
Linan Chen,
Florence Clerc,
Prakash Panangaden
Abstract:
Bisimulation is a concept that captures behavioural equivalence of states in a variety of types of transition systems. It has been widely studied in a discrete-time setting where the notion of a step is fundamental. In our setting we are considering "flow"-processes emphasizing that they evolve in continuous time. In such continuous-time settings, the concepts are not straightforward adaptations o…
▽ More
Bisimulation is a concept that captures behavioural equivalence of states in a variety of types of transition systems. It has been widely studied in a discrete-time setting where the notion of a step is fundamental. In our setting we are considering "flow"-processes emphasizing that they evolve in continuous time. In such continuous-time settings, the concepts are not straightforward adaptations of their discrete-time analogues and we restrict our study to diffusions that do not lose mass over time and with additional regularity constraints.
In previous work we proposed different definitions of behavioural equivalences for continuous-time stochastic processes where the evolution is a flow through time. That work only addressed equivalences. In this work, we aim at quantifying how differently processes behave. We present two pseudometrics for diffusion-like processes. These pseudometrics are fixpoints of two different functionals on the space of 1-bounded pseudometrics on the state space. We also characterize these pseudometrics in terms of real-valued modal logics; this is a quantitative analogue of the notion of logical characterization of bisimulation. These real-valued modal logics indicate that the two pseudometrics are different and thus yield different notions of behavioural equivalence.
△ Less
Submitted 30 April, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Authors:
Jonathan Colaço Carr,
Prakash Panangaden,
Doina Precup
Abstract:
Learning from Preferential Feedback (LfPF) plays an essential role in training Large Language Models, as well as certain types of interactive learning agents. However, a substantial gap exists between the theory and application of LfPF algorithms. Current results guaranteeing the existence of optimal policies in LfPF problems assume that both the preferences and transition dynamics are determined…
▽ More
Learning from Preferential Feedback (LfPF) plays an essential role in training Large Language Models, as well as certain types of interactive learning agents. However, a substantial gap exists between the theory and application of LfPF algorithms. Current results guaranteeing the existence of optimal policies in LfPF problems assume that both the preferences and transition dynamics are determined by a Markov Decision Process. We introduce the Direct Preference Process, a new framework for analyzing LfPF problems in partially-observable, non-Markovian environments. Within this framework, we establish conditions that guarantee the existence of optimal policies by considering the ordinal structure of the preferences. We show that a decision-making problem can have optimal policies -- that are characterized by recursive optimality equations -- even when no reward function can express the learning goal. These findings underline the need to explore preference-based learning strategies which do not assume that preferences are generated by reward.
△ Less
Submitted 27 March, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
A Kernel Perspective on Behavioural Metrics for Markov Decision Processes
Authors:
Pablo Samuel Castro,
Tyler Kastner,
Prakash Panangaden,
Mark Rowland
Abstract:
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T…
▽ More
Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Optimal Approximate Minimization of One-Letter Weighted Finite Automata
Authors:
Clara Lacroce,
Borja Balle,
Prakash Panangaden,
Guillaume Rabusseau
Abstract:
In this paper, we study the approximate minimization problem of weighted finite automata (WFAs): to compute the best possible approximation of a WFA given a bound on the number of states. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory.
We solve the optimal spec…
▽ More
In this paper, we study the approximate minimization problem of weighted finite automata (WFAs): to compute the best possible approximation of a WFA given a bound on the number of states. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory.
We solve the optimal spectral-norm approximate minimization problem for irredundant WFAs with real weights, defined over a one-letter alphabet. We present a theoretical analysis based on AAK theory, and bounds on the quality of the approximation in the spectral norm and $\ell^2$ norm. Moreover, we provide a closed-form solution, and an algorithm, to compute the optimal approximation of a given size in polynomial time.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Authors:
Prakash Panangaden,
Sahand Rezaei-Shoshtari,
Rosie Zhao,
David Meger,
Doina Precup
Abstract:
Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both st…
▽ More
Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the presence of such symmetries. We demonstrate the effectiveness of our method on our environments, as well as on challenging visual control tasks from the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance, and the visualizations of the latent space clearly demonstrate the structure of the learned abstraction.
△ Less
Submitted 7 March, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Propositional Logics for the Lawvere Quantale
Authors:
Giorgio Bacci,
Radu Mardare,
Prakash Panangaden,
Gordon Plotkin
Abstract:
Lawvere showed that generalised metric spaces are categories enriched over $[0, \infty]$, the quantale of the positive extended reals. The statement of enrichment is a quantitative analogue of being a preorder. Towards seeking a logic for quantitative metric reasoning, we investigate three $[0,\infty]$-valued propositional logics over the Lawvere quantale. The basic logical connectives shared by a…
▽ More
Lawvere showed that generalised metric spaces are categories enriched over $[0, \infty]$, the quantale of the positive extended reals. The statement of enrichment is a quantitative analogue of being a preorder. Towards seeking a logic for quantitative metric reasoning, we investigate three $[0,\infty]$-valued propositional logics over the Lawvere quantale. The basic logical connectives shared by all three logics are those that can be interpreted in any quantale, viz finite conjunctions and disjunctions, tensor (addition for the Lawvere quantale) and linear implication (here a truncated subtraction); to these we add, in turn, the constant $1$ to express integer values, and scalar multiplication by a non-negative real to express general affine combinations. Quantitative equational logic can be interpreted in the third logic if we allow inference systems instead of axiomatic systems. For each of these logics we develop a natural deduction system which we prove to be decidably complete w.r.t. the quantale-valued semantics. The heart of the completeness proof makes use of the Motzkin transposition theorem. Consistency is also decidable; the proof makes use of Fourier-Motzkin elimination of linear inequalities. Strong completeness does not hold in general, even (as is known) for theories over finitely-many propositional variables; indeed even an approximate form of strong completeness in the sense of Pavelka or Ben Yaacov -- provability up to arbitrary precision -- does not hold. However, we can show it for theories axiomatized by a (not necessarily finite) set of judgements in normal form over a finite set of propositional variables when we restrict to models that do not map variables to $\infty$; the proof uses Hurwicz's general form of the Farkas' Lemma.
△ Less
Submitted 17 November, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Sum and Tensor of Quantitative Effects
Authors:
Giorgio Bacci,
Radu Mardare,
Prakash Panangaden,
Gordon Plotkin
Abstract:
Inspired by the seminal work of Hyland, Plotkin, and Power on the combination of algebraic computational effects via sum and tensor, we develop an analogous theory for the combination of quantitative algebraic effects. Quantitative algebraic effects are monadic computational effects on categories of metric spaces, which, moreover, have an algebraic presentation in the form of quantitative equation…
▽ More
Inspired by the seminal work of Hyland, Plotkin, and Power on the combination of algebraic computational effects via sum and tensor, we develop an analogous theory for the combination of quantitative algebraic effects. Quantitative algebraic effects are monadic computational effects on categories of metric spaces, which, moreover, have an algebraic presentation in the form of quantitative equational theories, a logical framework introduced by Mardare, Panangaden, and Plotkin that generalises equational logic to account for a concept of approximate equality. As our main result, we show that the sum and tensor of two quantitative equational theories correspond to the categorical sum (i.e., coproduct) and tensor, respectively, of their effects qua monads. We further give a theory of quantitative effect transformers based on these two operations, essentially providing quantitative analogues to the following monad transformers due to Moggi: exception, resumption, reader, and writer transformers. Finally, as an application, we provide the first quantitative algebraic axiomatizations to the following coalgebraic structures: Markov processes, labelled Markov processes, Mealy machines, and Markov decision processes, each endowed with their respective bisimilarity metrics. Apart from the intrinsic interest in these axiomatizations, it is pleasing they have been obtained as the composition, via sum and tensor, of simpler quantitative equational theories.
△ Less
Submitted 4 August, 2024; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
Authors:
Sahand Rezaei-Shoshtari,
Rosie Zhao,
Prakash Panangaden,
David Meger,
Doina Precup
Abstract:
Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximat…
▽ More
Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Riemannian Diffusion Models
Authors:
Chin-Wei Huang,
Milad Aghajohari,
Avishek Joey Bose,
Prakash Panangaden,
Aaron Courville
Abstract:
Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in gen…
▽ More
Diffusion models are recent state-of-the-art methods for image generation and likelihood estimation. In this work, we generalize continuous-time diffusion models to arbitrary Riemannian manifolds and derive a variational framework for likelihood estimation. Computationally, we propose new methods for computing the Riemannian divergence which is needed in the likelihood estimation. Moreover, in generalizing the Euclidean case, we prove that maximizing this variational lower-bound is equivalent to Riemannian score matching. Empirically, we demonstrate the expressive power of Riemannian diffusion models on a wide spectrum of smooth manifolds, such as spheres, tori, hyperboloids, and orthogonal groups. Our proposed method achieves new state-of-the-art likelihoods on all benchmarks.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Towards an AAK Theory Approach to Approximate Minimization in the Multi-Letter Case
Authors:
Clara Lacroce,
Prakash Panangaden,
Guillaume Rabusseau
Abstract:
We study the approximate minimization problem of weighted finite automata (WFAs): given a WFA, we want to compute its optimal approximation when restricted to a given size. We reformulate the problem as a rank-minimization task in the spectral norm, and propose a framework to apply Adamyan-Arov-Krein (AAK) theory to the approximation problem. This approach has already been successfully applied to…
▽ More
We study the approximate minimization problem of weighted finite automata (WFAs): given a WFA, we want to compute its optimal approximation when restricted to a given size. We reformulate the problem as a rank-minimization task in the spectral norm, and propose a framework to apply Adamyan-Arov-Krein (AAK) theory to the approximation problem. This approach has already been successfully applied to the case of WFAs and language modelling black boxes over one-letter alphabets \citep{AAK-WFA,AAK-RNN}. Extending the result to multi-letter alphabets requires solving the following two steps. First, we need to reformulate the approximation problem in terms of noncommutative Hankel operators and noncommutative functions, in order to apply results from multivariable operator theory. Secondly, to obtain the optimal approximation we need a version of noncommutative AAK theory that is constructive. In this paper, we successfully tackle the first step, while the second challenge remains open.
△ Less
Submitted 31 May, 2022;
originally announced June 2022.
-
Interpreting Lambda Calculus in Domain-Valued Random Variables
Authors:
Robert Furber,
Radu Mardare,
Prakash Panangaden,
Dana Scott
Abstract:
We develop Boolean-valued domain theory and show how the lambda-calculus can be interpreted in using domain-valued random variables. We focus on the reflexive domain construction rather than the language and its semantics. The notion of equality has to be interpreted in the Boolean algebra and when we say that an equation is valid in the model we mean that its interpretation is the top element of…
▽ More
We develop Boolean-valued domain theory and show how the lambda-calculus can be interpreted in using domain-valued random variables. We focus on the reflexive domain construction rather than the language and its semantics. The notion of equality has to be interpreted in the Boolean algebra and when we say that an equation is valid in the model we mean that its interpretation is the top element of the Boolean algebra.
△ Less
Submitted 12 December, 2021;
originally announced December 2021.
-
Proceedings 17th International Conference on Quantum Physics and Logic
Authors:
Benoît Valiron,
Shane Mansfield,
Pablo Arrighi,
Prakash Panangaden
Abstract:
This volume contains the proceedings of the 17th International Conference on Quantum Physics and Logic (QPL 2020), which was held June 2-6, 2020. Quantum Physics and Logic is an annual conference that brings together researchers working on mathematical foundations of quantum physics, quantum computing, and related areas, with a focus on structural perspectives and the use of logical tools, ordered…
▽ More
This volume contains the proceedings of the 17th International Conference on Quantum Physics and Logic (QPL 2020), which was held June 2-6, 2020. Quantum Physics and Logic is an annual conference that brings together researchers working on mathematical foundations of quantum physics, quantum computing, and related areas, with a focus on structural perspectives and the use of logical tools, ordered algebraic and category-theoretic structures, formal languages, semantical methods, and other computer science techniques applied to the study of physical behavior in general. Work that applies structures and methods inspired by quantum theory to other fields (including computer science) is also welcome.
△ Less
Submitted 3 September, 2021;
originally announced September 2021.
-
Fixed-Points for Quantitative Equational Logics
Authors:
Radu Mardare,
Prakash Panangaden,
Gordon Plotkin
Abstract:
We develop a fixed-point extension of quantitative equational logic and give semantics in one-bounded complete quantitative algebras. Unlike previous related work about fixed-points in metric spaces, we are working with the notion of approximate equality rather than exact equality. The result is a novel theory of fixed points which can not only provide solutions to the traditional fixed-point equa…
▽ More
We develop a fixed-point extension of quantitative equational logic and give semantics in one-bounded complete quantitative algebras. Unlike previous related work about fixed-points in metric spaces, we are working with the notion of approximate equality rather than exact equality. The result is a novel theory of fixed points which can not only provide solutions to the traditional fixed-point equations but we can also define the rate of convergence to the fixed point. We show that such a theory is the quantitative analogue of a Conway theory and also of an iteration theory; and it reflects the metric coinduction principle. We study the Bellman equation for a Markov decision process as an illustrative example.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
MICo: Improved representations via sampling-based state similarity for Markov decision processes
Authors:
Pablo Samuel Castro,
Tyler Kastner,
Prakash Panangaden,
Mark Rowland
Abstract:
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed…
▽ More
We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.
△ Less
Submitted 21 January, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Extracting Weighted Automata for Approximate Minimization in Language Modelling
Authors:
Clara Lacroce,
Prakash Panangaden,
Guillaume Rabusseau
Abstract:
In this paper we study the approximate minimization problem for language modelling. We assume we are given some language model as a black box. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an al…
▽ More
In this paper we study the approximate minimization problem for language modelling. We assume we are given some language model as a black box. The objective is to obtain a weighted finite automaton (WFA) that fits within a given size constraint and which mimics the behaviour of the original model while minimizing some notion of distance between the black box and the extracted WFA. We provide an algorithm for the approximate minimization of black boxes trained for language modelling of sequential data over a one-letter alphabet. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. This allows us to use the spectral norm to measure the distance between the black box and the WFA. We provide theoretical guarantees to study the potentially infinite-rank Hankel matrix of the black box, without accessing the training data, and we prove that our method returns an asymptotically-optimal approximation.
△ Less
Submitted 23 July, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Optimal Spectral-Norm Approximate Minimization of Weighted Finite Automata
Authors:
Borja Balle,
Clara Lacroce,
Prakash Panangaden,
Doina Precup,
Guillaume Rabusseau
Abstract:
We address the approximate minimization problem for weighted finite automata (WFAs) with weights in $\mathbb{R}$, over a one-letter alphabet: to compute the best possible approximation of a WFA given a bound on the number of states. This work is grounded in Adamyan-Arov-Krein Approximation theory, a remarkable collection of results on the approximation of Hankel operators. In addition to its intri…
▽ More
We address the approximate minimization problem for weighted finite automata (WFAs) with weights in $\mathbb{R}$, over a one-letter alphabet: to compute the best possible approximation of a WFA given a bound on the number of states. This work is grounded in Adamyan-Arov-Krein Approximation theory, a remarkable collection of results on the approximation of Hankel operators. In addition to its intrinsic mathematical relevance, this theory has proven to be very effective for model reduction. We adapt these results to the framework of weighted automata over a one-letter alphabet. We provide theoretical guarantees and bounds on the quality of the approximation in the spectral and $\ell^2$ norm. We develop an algorithm that, based on the properties of Hankel operators, returns the optimal approximation in the spectral norm.
△ Less
Submitted 17 May, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Universal Semantics for the Stochastic Lambda-Calculus
Authors:
Pedro Amorim,
Dexter Kozen,
Radu Mardare,
Prakash Panangaden,
Michael Roberts
Abstract:
We define sound and adequate denotational and operational semantics for the stochastic lambda calculus. These two semantic approaches build on previous work that used similar techniques to reason about higher-order probabilistic programs, but for the first time admit an adequacy theorem relating the operational and denotational views. This resolves the main issue left open in (Bacci et al. 2018).
We define sound and adequate denotational and operational semantics for the stochastic lambda calculus. These two semantic approaches build on previous work that used similar techniques to reason about higher-order probabilistic programs, but for the first time admit an adequacy theorem relating the operational and denotational views. This resolves the main issue left open in (Bacci et al. 2018).
△ Less
Submitted 14 May, 2021; v1 submitted 26 November, 2020;
originally announced November 2020.
-
Weighted automata are compact and actively learnable
Authors:
Artem Kaznatcheev,
Prakash Panangaden
Abstract:
We show that weighted automata over the field of two elements can be exponentially more compact than non-deterministic finite state automata. To show this, we combine ideas from automata theory and communication complexity. However, weighted automata are also efficiently learnable in Angluin's minimal adequate teacher model in a number of queries that is polynomial in the size of the minimal weigh…
▽ More
We show that weighted automata over the field of two elements can be exponentially more compact than non-deterministic finite state automata. To show this, we combine ideas from automata theory and communication complexity. However, weighted automata are also efficiently learnable in Angluin's minimal adequate teacher model in a number of queries that is polynomial in the size of the minimal weighted automaton.. We include an algorithm for learning WAs over any field based on a linear algebraic generalization of the Angluin-Schapire algorithm. Together, this produces a surprising result: weighted automata over fields are structured enough that even though they can be very compact, they are still efficiently learnable.
△ Less
Submitted 22 April, 2021; v1 submitted 20 November, 2020;
originally announced November 2020.
-
A Study of Policy Gradient on a Class of Exactly Solvable Models
Authors:
Gavin McCracken,
Colin Daniels,
Rosie Zhao,
Anna Brandenberger,
Prakash Panangaden,
Doina Precup
Abstract:
Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on rando…
▽ More
Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policy's value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Minimisation in Logical Form
Authors:
Nick Bezhanishvili,
Marcello Bonsangue,
Helle Hvid Hansen,
Dexter Kozen,
Clemens Kupke,
Prakash Panangaden,
Alexandra Silva
Abstract:
Stone-type dualities provide a powerful mathematical framework for studying properties of logical systems. They have recently been fruitfully explored in understanding minimisation of various types of automata. In Bezhanishvili et al. (2012), a dual equivalence between a category of coalgebras and a category of algebras was used to explain minimisation. The algebraic semantics is dual to a coalgeb…
▽ More
Stone-type dualities provide a powerful mathematical framework for studying properties of logical systems. They have recently been fruitfully explored in understanding minimisation of various types of automata. In Bezhanishvili et al. (2012), a dual equivalence between a category of coalgebras and a category of algebras was used to explain minimisation. The algebraic semantics is dual to a coalgebraic semantics in which logical equivalence coincides with trace equivalence. It follows that maximal quotients of coalgebras correspond to minimal subobjects of algebras. Examples include partially observable deterministic finite automata, linear weighted automata viewed as coalgebras over finite-dimensional vector spaces, and belief automata, which are coalgebras on compact Hausdorff spaces. In Bonchi et al. (2014), Brzozowski's double-reversal minimisation algorithm for deterministic finite automata was described categorically and its correctness explained via the duality between reachability and observability. This work includes generalisations of Brzozowski's algorithm to Moore and weighted automata over commutative semirings.
In this paper we propose a general categorical framework within which such minimisation algorithms can be understood. The goal is to provide a unifying perspective based on duality. Our framework consists of a stack of three interconnected adjunctions: a base dual adjunction that can be lifted to a dual adjunction between coalgebras and algebras and also to a dual adjunction between automata. The approach provides an abstract understanding of reachability and observability. We illustrate the general framework on range of concrete examples, including deterministic Kripke frames, weighted automata, topological automata (belief automata), and alternating automata.
△ Less
Submitted 23 May, 2020;
originally announced May 2020.
-
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
Authors:
Philip Amortila,
Doina Precup,
Prakash Panangaden,
Marc G. Bellemare
Abstract:
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($λ$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions,…
▽ More
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($λ$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Latent Variable Modelling with Hyperbolic Normalizing Flows
Authors:
Avishek Joey Bose,
Ariella Smofsky,
Renjie Liao,
Prakash Panangaden,
William L. Hamilton
Abstract:
The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with a…
▽ More
The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with an underlying hierarchical structure. To address this fundamental limitation, we present the first extension of normalizing flows to hyperbolic spaces. We first elevate normalizing flows to hyperbolic spaces using coupling transforms defined on the tangent bundle, termed Tangent Coupling ($\mathcal{TC}$). We further introduce Wrapped Hyperboloid Coupling ($\mathcal{W}\mathbb{H}C$), a fully invertible and learnable transformation that explicitly utilizes the geometric structure of hyperbolic spaces, allowing for expressive posteriors while being efficient to sample from. We demonstrate the efficacy of our novel normalizing flow over hyperbolic VAEs and Euclidean normalizing flows. Our approach achieves improved performance on density estimation, as well as reconstruction of real-world graph data, which exhibit a hierarchical structure. Finally, we show that our approach can be used to power a generative model over hierarchical data using hyperbolic latent variables.
△ Less
Submitted 13 August, 2020; v1 submitted 15 February, 2020;
originally announced February 2020.
-
Bisimulation for Feller-Dynkin Processes
Authors:
Linan Chen,
Florence Clerc,
Prakash Panangaden
Abstract:
Bisimulation is a concept that captures behavioural equivalence. It has been studied extensively on nonprobabilistic systems and on discrete-time Markov processes and on so-called continuous-time Markov chains. In the latter time is continuous but the evolution still proceeds in jumps. We propose two definitions of bisimulation on continuous-time stochastic processes where the evolution is a \emph…
▽ More
Bisimulation is a concept that captures behavioural equivalence. It has been studied extensively on nonprobabilistic systems and on discrete-time Markov processes and on so-called continuous-time Markov chains. In the latter time is continuous but the evolution still proceeds in jumps. We propose two definitions of bisimulation on continuous-time stochastic processes where the evolution is a \emph{flow} through time. We show that they are equivalent and we show that when restricted to discrete-time, our concept of bisimulation encompasses the standard discrete-time concept. The concept we introduce is not a straightforward generalization of discrete-time concepts.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
On the Axiomatizability of Quantitative Algebras
Authors:
Radu Mardare,
Prakash Panangaden,
Gordon Plotkin
Abstract:
Quantitative algebras (QAs) are algebras over metric spaces defined by quantitative equational theories as introduced by the same authors in a related paper presented at LICS 2016. These algebras provide the mathematical foundation for metric semantics of probabilistic, stochastic and other quantitative systems. This paper considers the issue of axiomatizability of QAs. We investigate the entire s…
▽ More
Quantitative algebras (QAs) are algebras over metric spaces defined by quantitative equational theories as introduced by the same authors in a related paper presented at LICS 2016. These algebras provide the mathematical foundation for metric semantics of probabilistic, stochastic and other quantitative systems. This paper considers the issue of axiomatizability of QAs. We investigate the entire spectrum of types of quantitative equations that can be used to axiomatize theories: (i) simple quantitative equations; (ii) Horn clauses with no more than $c$ equations between variables as hypotheses, where $c$ is a cardinal and (iii) the most general case of Horn clauses. In each case we characterize the class of QAs and prove variety/quasivariety theorems that extend and generalize classical results from model theory for algebras and first-order structures.
△ Less
Submitted 5 April, 2018;
originally announced April 2018.
-
Free complete Wasserstein algebras
Authors:
Radu Mardare,
Prakash Panangaden,
Gordon D. Plotkin
Abstract:
We present an algebraic account of the Wasserstein distances $W_p$ on complete metric spaces, for $p \geq 1$. This is part of a program of a quantitative algebraic theory of effects in programming languages. In particular, we give axioms, parametric in $p$, for algebras over metric spaces equipped with probabilistic choice operations. The axioms say that the operations form a barycentric algebra a…
▽ More
We present an algebraic account of the Wasserstein distances $W_p$ on complete metric spaces, for $p \geq 1$. This is part of a program of a quantitative algebraic theory of effects in programming languages. In particular, we give axioms, parametric in $p$, for algebras over metric spaces equipped with probabilistic choice operations. The axioms say that the operations form a barycentric algebra and that the metric satisfies a property typical of the Wasserstein distance $W_p$. We show that the free complete such algebra over a complete metric space is that of the Radon probability measures with finite moments of order $p$, equipped with the Wasserstein distance as metric and with the usual binary convex sums as operations.
△ Less
Submitted 13 September, 2018; v1 submitted 20 February, 2018;
originally announced February 2018.
-
Singular value automata and approximate minimization
Authors:
Borja Balle,
Prakash Panangaden,
Doina Precup
Abstract:
The present paper uses spectral theory of linear operators to construct approximately minimal realizations of weighted languages. Our new contributions are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices based on their representation in terms of weighted automata, (ii) a new canonical form for weighted automata arising from the SVD of its corresponding Hankel matrix and…
▽ More
The present paper uses spectral theory of linear operators to construct approximately minimal realizations of weighted languages. Our new contributions are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices based on their representation in terms of weighted automata, (ii) a new canonical form for weighted automata arising from the SVD of its corresponding Hankel matrix and (iii) an algorithm to construct approximate minimizations of given weighted automata by truncating the canonical form. We give bounds on the quality of our approximation.
△ Less
Submitted 27 May, 2019; v1 submitted 16 November, 2017;
originally announced November 2017.
-
A categorical characterization of relative entropy on standard Borel spaces
Authors:
Nicolas Gagne,
Prakash Panangaden
Abstract:
We give a categorical treatment, in the spirit of Baez and Fritz, of relative entropy for probability distributions defined on standard Borel spaces. We define a category suitable for reasoning about statistical inference on standard Borel spaces. We define relative entropy as a functor into Lawvere's category and we show convexity, lower semicontinuity and uniqueness.
We give a categorical treatment, in the spirit of Baez and Fritz, of relative entropy for probability distributions defined on standard Borel spaces. We define a category suitable for reasoning about statistical inference on standard Borel spaces. We define relative entropy as a functor into Lawvere's category and we show convexity, lower semicontinuity and uniqueness.
△ Less
Submitted 8 November, 2023; v1 submitted 26 March, 2017;
originally announced March 2017.
-
Bisimulation Metrics for Weighted Automata
Authors:
Borja Balle,
Pascale Gourdeau,
Prakash Panangaden
Abstract:
We develop a new bisimulation (pseudo)metric for weighted finite automata (WFA) that generalizes Boreale's linear bisimulation relation. Our metrics are induced by seminorms on the state space of WFA. Our development is based on spectral properties of sets of linear operators. In particular, the joint spectral radius of the transition matrices of WFA plays a central role. We also study continuity…
▽ More
We develop a new bisimulation (pseudo)metric for weighted finite automata (WFA) that generalizes Boreale's linear bisimulation relation. Our metrics are induced by seminorms on the state space of WFA. Our development is based on spectral properties of sets of linear operators. In particular, the joint spectral radius of the transition matrices of WFA plays a central role. We also study continuity properties of the bisimulation pseudometric, establish an undecidability result for computing the metric, and give a preliminary account of applications to spectral learning of weighted automata.
△ Less
Submitted 14 May, 2017; v1 submitted 26 February, 2017;
originally announced February 2017.
-
Quantum Alternation: Prospects and Problems
Authors:
Costin Bădescu,
Prakash Panangaden
Abstract:
We propose a notion of quantum control in a quantum programming language which permits the superposition of finitely many quantum operations without performing a measurement. This notion takes the form of a conditional construct similar to the IF statement in classical programming languages. We show that adding such a quantum IF statement to the QPL programming language simplifies the presentation…
▽ More
We propose a notion of quantum control in a quantum programming language which permits the superposition of finitely many quantum operations without performing a measurement. This notion takes the form of a conditional construct similar to the IF statement in classical programming languages. We show that adding such a quantum IF statement to the QPL programming language simplifies the presentation of several quantum algorithms. This motivates the possibility of extending the denotational semantics of QPL to include this form of quantum alternation. We give a denotational semantics for this extension of QPL based on Kraus decompositions rather than on superoperators. Finally, we clarify the relation between quantum alternation and recursion, and discuss the possibility of lifting the semantics defined by Kraus operators to the superoperator semantics defined by Selinger.
△ Less
Submitted 4 November, 2015;
originally announced November 2015.
-
A Canonical Form for Weighted Automata and Applications to Approximate Minimization
Authors:
Borja Balle,
Prakash Panangaden,
Doina Precup
Abstract:
We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a finite WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship betwee…
▽ More
We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a finite WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship between rational series and WFA is analogous to the relationship between regular languages and ordinary automata. Associated with such rational series are infinite matrices called Hankel matrices which play a fundamental role in the theory of minimal WFA. Our contributions are: (1) an effective procedure for computing the singular value decomposition (SVD) of such infinite Hankel matrices based on their representation in terms of finite WFA; (2) a new canonical form for finite WFA based on this SVD decomposition; and, (3) an algorithm to construct approximate minimizations of a given WFA. The goal of our approximate minimization algorithm is to start from a minimal WFA and produce a smaller WFA that is close to the given one in a certain sense. The desired size of the approximating automaton is given as input. We give bounds describing how well the approximation emulates the behavior of the original WFA.
△ Less
Submitted 24 April, 2015; v1 submitted 27 January, 2015;
originally announced January 2015.
-
Proceedings of the 11th workshop on Quantum Physics and Logic
Authors:
Bob Coecke,
Ichiro Hasuo,
Prakash Panangaden
Abstract:
This volume contains the proceedings of the 11th International Workshop on Quantum Physics and Logic (QPL 2014), which was held from the 4th to the 6th of June, 2014, at Kyoto University, Japan.
The goal of the QPL workshop series is to bring together researchers working on mathematical foundations of quantum physics, quantum computing and spatio-temporal causal structures, and in particular tho…
▽ More
This volume contains the proceedings of the 11th International Workshop on Quantum Physics and Logic (QPL 2014), which was held from the 4th to the 6th of June, 2014, at Kyoto University, Japan.
The goal of the QPL workshop series is to bring together researchers working on mathematical foundations of quantum physics, quantum computing and spatio-temporal causal structures, and in particular those that use logical tools, ordered algebraic and category-theoretic structures, formal languages, semantic methods and other computer science methods for the study of physical behavior in general. Over the past few years, there has been growing activity in these foundational approaches, together with a renewed interest in the foundations of quantum theory, which complement the more mainstream research in quantum computation. Earlier workshops in this series, with the same acronym under the name "Quantum Programming Languages", were held in Ottawa (2003), Turku (2004), Chicago (2005), and Oxford (2006). The first QPL under the new name Quantum Physics and Logic was held in Reykjavik (2008), followed by Oxford (2009 and 2010), Nijmegen (2011), Brussels (2012) and Barcelona (2013).
△ Less
Submitted 27 December, 2014;
originally announced December 2014.
-
Proceedings 9th Workshop on Quantum Physics and Logic
Authors:
Ross Duncan,
Prakash Panangaden
Abstract:
This volume contains the proceedings of the ninth workshop on Quantum Physics and Logic (QPL2012) which took place in Brussels from the 10th to the 12th of October 2012.
QPL2012 brought together researchers working on mathematical foundations of quantum physics, quantum computing, and spatio-temporal causal structures. The particular focus was on the use of logical tools, ordered algebraic and…
▽ More
This volume contains the proceedings of the ninth workshop on Quantum Physics and Logic (QPL2012) which took place in Brussels from the 10th to the 12th of October 2012.
QPL2012 brought together researchers working on mathematical foundations of quantum physics, quantum computing, and spatio-temporal causal structures. The particular focus was on the use of logical tools, ordered algebraic and category-theoretic structures, formal languages, semantical techniques, and other computer science methods for the study of physical behaviour in general.
△ Less
Submitted 28 July, 2014;
originally announced July 2014.
-
Metrics for Finite Markov Decision Processes
Authors:
Norman Ferns,
Prakash Panangaden,
Doina Precup
Abstract:
We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based o…
▽ More
We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.
△ Less
Submitted 11 July, 2012;
originally announced July 2012.
-
Metrics for Markov Decision Processes with Infinite State Spaces
Authors:
Norman Ferns,
Prakash Panangaden,
Doina Precup
Abstract:
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task v…
▽ More
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task varies continuously with respect to our metric distances.
△ Less
Submitted 4 July, 2012;
originally announced July 2012.
-
Methods for computing state similarity in Markov Decision Processes
Authors:
Norman Ferns,
Pablo Samuel Castro,
Doina Precup,
Prakash Panangaden
Abstract:
A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovic…
▽ More
A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
Proceedings Sixth Workshop on Developments in Computational Models: Causality, Computation, and Physics
Authors:
S. Barry Cooper,
Prakash Panangaden,
Elham Kashefi
Abstract:
DCM 2010 provides a forum for ideas about new computing means and models, with a particular emphasis in 2010 on computational and causal models related to physics and biology. We believe that bringing together different approaches - in a community with the strong foundational background characteristic of FLoC - results in inspirational cross-boundary exchanges, and innovative further research. Day…
▽ More
DCM 2010 provides a forum for ideas about new computing means and models, with a particular emphasis in 2010 on computational and causal models related to physics and biology. We believe that bringing together different approaches - in a community with the strong foundational background characteristic of FLoC - results in inspirational cross-boundary exchanges, and innovative further research. Day two of this pre-FLoC 2010 workshop is given over to physics and quantum related computation. The content of day one is more typical of previous DCM workshops - covering a full spectrum of topics related to the development of new computational models or new features for traditional computational models. DCM 2010 was designed to foster interactions, and provide a forum for presenting new ideas and work in progress. It is also intended to enable newcomers to learn about current research in this area.
△ Less
Submitted 23 July, 2010; v1 submitted 9 June, 2010;
originally announced June 2010.
-
Approximate reasoning for real-time probabilistic processes
Authors:
Vineet Gupta,
Radha Jagadeesan,
Prakash Panangaden
Abstract:
We develop a pseudo-metric analogue of bisimulation for generalized semi-Markov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we have extended bisimulation for continuous-time probabilistic processes to a much broader class of distributions than exponential distributions. This pseudo-metric gives a useful handle on approximate reasoning in the presence of numerica…
▽ More
We develop a pseudo-metric analogue of bisimulation for generalized semi-Markov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we have extended bisimulation for continuous-time probabilistic processes to a much broader class of distributions than exponential distributions. This pseudo-metric gives a useful handle on approximate reasoning in the presence of numerical information -- such as probabilities and time -- in the model. We give a fixed point characterization of the pseudo-metric. This makes available coinductive reasoning principles for reasoning about distances. We demonstrate that our approach is insensitive to potentially ad hoc articulations of distance by showing that it is intrinsic to an underlying uniformity. We provide a logical characterization of this uniformity using a real-valued modal logic. We show that several quantitative properties of interest are continuous with respect to the pseudo-metric. Thus, if two processes are metrically close, then observable quantitative properties of interest are indeed close.
△ Less
Submitted 8 March, 2006; v1 submitted 24 May, 2005;
originally announced May 2005.
-
On the Expressive Power of First-Order Boolean Functions in PCF
Authors:
Riccardo Pucella,
Prakash Panangaden
Abstract:
Recent results of Bucciarelli show that the semilattice of degrees of parallelism of first-order boolean functions in PCF has both infinite chains and infinite antichains. By considering a simple subclass of Sieber's sequentiality relations, we identify levels in the semilattice and derive inexpressibility results concerning functions on different levels. This allows us to further explore the st…
▽ More
Recent results of Bucciarelli show that the semilattice of degrees of parallelism of first-order boolean functions in PCF has both infinite chains and infinite antichains. By considering a simple subclass of Sieber's sequentiality relations, we identify levels in the semilattice and derive inexpressibility results concerning functions on different levels. This allows us to further explore the structure of the semilattice of degrees of parallelism: we identify semilattices characterized by simple level properties, and show the existence of new infinite hierarchies which are in a certain sense natural with respect to the levels.
△ Less
Submitted 24 May, 2004;
originally announced May 2004.