Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Russel, R H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2108.02701  [pdf, ps, other

    cs.LG eess.SY

    Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

    Authors: Reazul Hasan Russel, Mouhacine Benosman, Jeroen Van Baar, Radu Corcodel

    Abstract: Safety and robustness are two desired properties for any reinforcement learning algorithm. CMDPs can handle additional safety constraints and RMDPs can perform well under model uncertainties. In this paper, we propose to unite these two frameworks resulting in robust constrained MDPs (RCMDPs). The motivation is to develop a framework that can satisfy safety constraints while also simultaneously of… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: text overlap with arXiv:2010.04870

  2. arXiv:2010.04870  [pdf, other

    cs.LG

    Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

    Authors: Reazul Hasan Russel, Mouhacine Benosman, Jeroen Van Baar

    Abstract: In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision process (CMDP), with the theory of robust Markov decision process (RMDP), leading to a formulation of robust constrained-MDPs (RCMDP). This formulation, simple in… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  3. arXiv:2006.11679  [pdf, other

    cs.LG math.OC stat.ML

    Entropic Risk Constrained Soft-Robust Policy Optimization

    Authors: Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

    Abstract: Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning. It is important in high-stakes domains to quantify and manage risk induced by model uncertainties. Entropic risk measure is an exponential utility-based convex risk measure that satisfies many reasonable properties. In this paper, we propose an entropic risk constrained policy gradient and actor-cri… ▽ More

    Submitted 20 June, 2020; originally announced June 2020.

  4. arXiv:1912.02696  [pdf, other

    cs.LG cs.AI stat.ML

    Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

    Authors: Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

    Abstract: Optimal policies in Markov decision processes (MDPs) are very sensitive to model misspecification. This raises serious concerns about deploying them in high-stake domains. Robust MDPs (RMDP) provide a promising framework to mitigate vulnerabilities by computing policies with worst-case guarantees in reinforcement learning. The solution quality of an RMDP depends on the ambiguity set, which is a qu… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1910.10786

  5. arXiv:1912.02150  [pdf, other

    cs.AI cs.LO

    A Probabilistic Approach to Satisfiability of Propositional Logic Formulae

    Authors: Reazul Hasan Russel

    Abstract: We propose a version of WalkSAT algorithm, named as BetaWalkSAT. This method uses probabilistic reasoning for biasing the starting state of the local search algorithm. Beta distribution is used to model the belief over boolean values of the literals. Our results suggest that, the proposed BetaWalkSAT algorithm can outperform other uninformed local search approaches for complex boolean satisfiabili… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

  6. arXiv:1910.10786  [pdf, other

    cs.LG cs.AI stat.ML

    Optimizing Percentile Criterion Using Robust MDPs

    Authors: Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

    Abstract: We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially fr… ▽ More

    Submitted 25 February, 2021; v1 submitted 23 October, 2019; originally announced October 2019.

  7. arXiv:1904.08528  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Exploration with Tight Bayesian Plausibility Sets

    Authors: Reazul H. Russel, Tianyi Gu, Marek Petrik

    Abstract: Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms. We propose optimism in the face of sensible value functions (OFVF)- a novel data-driven Bayesian algorithm to constructing Plausibility sets for MDPs to explore robustly minimizing the worst case exploration cost. The method computes polici… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  8. arXiv:1901.07010  [pdf, other

    cs.LG stat.ML

    A Short Survey on Probabilistic Reinforcement Learning

    Authors: Reazul Hasan Russel

    Abstract: A reinforcement learning agent tries to maximize its cumulative payoff by interacting in an unknown environment. It is important for the agent to explore suboptimal actions as well as to pick actions with highest known rewards. Yet, in sensitive domains, collecting more data with exploration is not always possible, but it is important to find a policy with a certain performance guaranty. In this p… ▽ More

    Submitted 21 January, 2019; originally announced January 2019.

    Comments: 7 pages, originally written as a literature survey for PhD candidacy exam

  9. arXiv:1811.06512  [pdf, other

    cs.LG stat.ML

    Tight Bayesian Ambiguity Sets for Robust MDPs

    Authors: Reazul Hasan Russel, Marek Petrik

    Abstract: Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters. We address the problem of using robust MDPs (RMDPs) to compute policies with provable worst-case guarantees in reinforcement learning. The quality and robustness of an RMDP solution is determined by its ambiguity set. Existing methods construct ambiguity sets that lea… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

    Comments: 5 pages. Accepted at Infer to Control Workshop at Neural Information Processing Systems (NIPS) 2018

  10. arXiv:1704.03926  [pdf, other

    cs.LG cs.AI stat.ML

    Value Directed Exploration in Multi-Armed Bandits with Structured Priors

    Authors: Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml

    Abstract: Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation. While there has been progress in developing algorithms with strong theoretical guarantees, there has been less focus on practical near-optimal finite-time performance. In this paper, we propose an algorithm for Bayesian multi-armed bandits that utilizes value-function-driven o… ▽ More

    Submitted 17 May, 2017; v1 submitted 12 April, 2017; originally announced April 2017.