Search | arXiv e-print repository

A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces

Authors: Bekzhan Kerimkulov, James-Michael Leahy, David Siska, Lukasz Szpruch, Yufei Zhang

Abstract: We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow… ▽ More We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows. △ Less

Submitted 4 October, 2023; originally announced October 2023.

MSC Class: 90C40; 93E20; 90C26; 60B05; 90C53

arXiv:2202.03188 [pdf]

Knowledge-Integrated Informed AI for National Security

Authors: Anu K. Myne, Kevin J. Leahy, Ryan J. Soklaski

Abstract: The state of artificial intelligence technology has a rich history that dates back decades and includes two fall-outs before the explosive resurgence of today, which is credited largely to data-driven techniques. While AI technology has and continues to become increasingly mainstream with impact across domains and industries, it's not without several drawbacks, weaknesses, and potential to cause u… ▽ More The state of artificial intelligence technology has a rich history that dates back decades and includes two fall-outs before the explosive resurgence of today, which is credited largely to data-driven techniques. While AI technology has and continues to become increasingly mainstream with impact across domains and industries, it's not without several drawbacks, weaknesses, and potential to cause undesired effects. AI techniques are numerous with many approaches and variants, but they can be classified simply based on the degree of knowledge they capture and how much data they require; two broad categories emerge as prominent across AI to date: (1) techniques that are primarily, and often solely, data-driven while leveraging little to no knowledge and (2) techniques that primarily leverage knowledge and depend less on data. Now, a third category is starting to emerge that leverages both data and knowledge, that some refer to as "informed AI." This third category can be a game changer within the national security domain where there is ample scientific and domain-specific knowledge that stands ready to be leveraged, and where purely data-driven AI can lead to serious unwanted consequences. This report shares findings from a thorough exploration of AI approaches that exploit data as well as principled and/or practical knowledge, which we refer to as "knowledge-integrated informed AI." Specifically, we review illuminating examples of knowledge integrated in deep learning and reinforcement learning pipelines, taking note of the performance gains they provide. We also discuss an apparent trade space across variants of knowledge-integrated informed AI, along with observed and prominent issues that suggest worthwhile future research directions. Most importantly, this report suggests how the advantages of knowledge-integrated informed AI stand to benefit the national security domain. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Report number: Technical Report TR-1272

arXiv:2201.07296 [pdf, ps, other]

Convergence of Policy Gradient for Entropy Regularized MDPs with Neural Network Approximation in the Mean-Field Regime

Authors: Bekzhan Kerimkulov, James-Michael Leahy, David Šiška, Lukasz Szpruch

Abstract: We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flo… ▽ More We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, and entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of the non-linear Fokker-Planck-Kolmogorov equation and extend the pioneering work of Mei et al. 2020 and Agarwal et al. 2020, which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting. △ Less

Submitted 16 June, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

arXiv:1901.00984 [pdf, other]

Quantum Insertion-Deletion Channels

Authors: Janet Leahy, Dave Touchette, Penghui Yao

Abstract: We introduce a model of quantum insertion-deletion (insdel) channels. Insdel channels are meant to represent, for example, synchronization errors arising in data transmission. In the classical setting, they represent a strict generalization of the better-understood corruption error channels, and until recently, had mostly resisted effort toward a similar understanding as their corruption counterpa… ▽ More We introduce a model of quantum insertion-deletion (insdel) channels. Insdel channels are meant to represent, for example, synchronization errors arising in data transmission. In the classical setting, they represent a strict generalization of the better-understood corruption error channels, and until recently, had mostly resisted effort toward a similar understanding as their corruption counterparts. They have received considerable attention in recent years. Very recently, Haeupler and Shahrasbi developed a framework, using what they call synchronisation strings, that allows one to turn insdel-type errors into corruption-type errors. These can then be handled by the use of standard error-correcting codes. We show that their framework can be extended to the quantum setting, providing a way to turn quantum insdel errors into quantum corruption errors, which can be handled with standard quantum error-correcting codes. △ Less

Submitted 4 January, 2019; originally announced January 2019.

arXiv:1803.04813 [pdf]

doi 10.1016/j.wasman.2016.08.023

Artificial neural network based modelling approach for municipal solid waste gasification in a fluidized bed reactor

Authors: Daya Shankar Pandey, Saptarshi Das, Indranil Pan, James J. Leahy, Witold Kwapinski

Abstract: In this paper, multi-layer feed forward neural networks are used to predict the lower heating value of gas (LHV), lower heating value of gasification products including tars and entrained char (LHVp) and syngas yield during gasification of municipal solid waste (MSW) during gasification in a fluidized bed reactor. These artificial neural networks (ANNs) with different architectures are trained usi… ▽ More In this paper, multi-layer feed forward neural networks are used to predict the lower heating value of gas (LHV), lower heating value of gasification products including tars and entrained char (LHVp) and syngas yield during gasification of municipal solid waste (MSW) during gasification in a fluidized bed reactor. These artificial neural networks (ANNs) with different architectures are trained using the Levenberg-Marquardt (LM) back-propagation algorithm and a cross validation is also performed to ensure that the results generalise to other unseen datasets. A rigorous study is carried out on optimally choosing the number of hidden layers, number of neurons in the hidden layer and activation function in a network using multiple Monte Carlo runs. Nine input and three output parameters are used to train and test various neural network architectures in both multiple output and single output prediction paradigms using the available experimental datasets. The model selection procedure is carried out to ascertain the best network architecture in terms of predictive accuracy. The simulation results show that the ANN based methodology is a viable alternative which can be used to predict the performance of a fluidized bed gasifier. △ Less

Submitted 5 February, 2018; originally announced March 2018.

Comments: 34 pages, 11 figures

Journal ref: Waste Management (Elsevier), Volume 58, December 2016, Pages 202-213

Showing 1–5 of 5 results for author: Leahy, J