Zum Hauptinhalt springen

Showing 1–11 of 11 results for author: Panaganti, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15788  [pdf, other

    cs.LG

    Distributionally Robust Constrained Reinforcement Learning under Strong Duality

    Authors: Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue

    Abstract: We study the problem of Distributionally Robust Constrained RL (DRC-RL), where the goal is to maximize the expected reward subject to environmental distribution shifts and constraints. This setting captures situations where training and testing environments differ, and policies must satisfy constraints motivated by safety or limited budgets. Despite significant progress toward algorithm design for… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted at the Reinforcement Learning Conference (RLC) 2024; 28 pages, 4 figures

  2. arXiv:2406.14156  [pdf, other

    cs.GT cs.LG cs.MA

    Tractable Equilibrium Computation in Markov Games through Risk Aversion

    Authors: Eric Mazumdar, Kishan Panaganti, Laixi Shi

    Abstract: A significant roadblock to the development of principled multi-agent reinforcement learning is the fact that desired solution concepts like Nash equilibria may be intractable to compute. To overcome this obstacle, we take inspiration from behavioral economics and show that -- by imbuing agents with important features of human decision-making like risk aversion and bounded rationality -- a class of… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: preprint of multi-agent RL with risk-averse equilibria

  3. arXiv:2405.05468  [pdf, ps, other

    cs.LG stat.ML

    Model-Free Robust $φ$-Divergence Reinforcement Learning Using Both Offline and Online Data

    Authors: Kishan Panaganti, Adam Wierman, Eric Mazumdar

    Abstract: The robust $φ$-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust $φ$-regularized fitted Q-iteration (RPQ) for learning an $ε$-opt… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: To appear in the proceedings of the International Conference on Machine Learning (ICML) 2024

  4. arXiv:2310.18434  [pdf, other

    cs.LG stat.ML

    Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 33 pages, preprint

  5. arXiv:2303.02783  [pdf, other

    cs.LG cs.AI stat.ML

    Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

    Authors: Zaiyan Xu, Kishan Panaganti, Dileep Kalathil

    Abstract: We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment. We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertai… ▽ More

    Submitted 20 May, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Appeared in AISTATS 2023

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9728-9754, 2023

  6. arXiv:2211.15823  [pdf, other

    cs.LG cs.AI cs.IR

    Personalized Reward Learning with Interaction-Grounded Learning (IGL)

    Authors: Jessica Maghakian, Paul Mineiro, Kishan Panaganti, Mark Rucker, Akanksha Saran, Cheng Tan

    Abstract: In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for the same fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work highlighting that (i)… ▽ More

    Submitted 3 March, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: ICLR 2023

  7. arXiv:2208.05129  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning using Offline Data

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to… ▽ More

    Submitted 18 October, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Appeared in Neural Information Processing Systems (NeurIPS) 2022

  8. arXiv:2112.09865  [pdf, other

    stat.ML cs.LG

    Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

    Authors: Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick

    Abstract: We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algori… ▽ More

    Submitted 18 August, 2024; v1 submitted 18 December, 2021; originally announced December 2021.

    Comments: 23 pages, 6 figures, manuscript under review

  9. arXiv:2112.01506  [pdf, other

    cs.LG stat.ML

    Sample Complexity of Robust Reinforcement Learning with a Generative Model

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an… ▽ More

    Submitted 14 May, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Published in the International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  10. arXiv:2006.11608  [pdf, other

    cs.LG eess.SY stat.ML

    Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-… ▽ More

    Submitted 11 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 26 pages, 12 figures, 2 tables

  11. Bounded Regret for Finitely Parameterized Multi-Armed Bandits

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call Finitely… ▽ More

    Submitted 7 November, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 15 pages, 7 figures, Reinforcement Learning, Multi-armed Bandits, Sequential Decision Making