Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Moradipari, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.04338  [pdf, other

    cs.LG eess.SY

    Convex Methods for Constrained Linear Bandits

    Authors: Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall

    Abstract: Recently, bandit optimization has received significant attention in real-world safety-critical systems that involve repeated interactions with humans. While there exist various algorithms with performance guarantees in the literature, practical implementation of the algorithms has not received as much attention. This work presents a comprehensive study on the computational aspects of safe bandit a… ▽ More

    Submitted 9 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  2. arXiv:2310.20007  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

    Authors: Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal

    Abstract: In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous rei… ▽ More

    Submitted 6 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2307.13978  [pdf, other

    cs.LG cs.AI cs.CV

    Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

    Authors: Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, Hongsheng Lu, Amir Rahmani

    Abstract: Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets. However, the challenge of exerting control over the generation process of GANs remains a significant hurdle. In this paper, we propose a novel methodology to address this issue by integrating a reinforcement learning (RL) agent with a latent-space GAN (l-GAN), thereb… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: 7 pages, 7 figures, 2 tables, conference paper

  4. arXiv:2301.10893  [pdf, other

    cs.RO

    Predicting Parameters for Modeling Traffic Participants

    Authors: Ahmadreza Moradipari, Sangjae Bae, Mahnoosh Alizadeh, Ehsan Moradi Pari, David Isele

    Abstract: Accurately modeling the behavior of traffic participants is essential for safely and efficiently navigating an autonomous vehicle through heavy traffic. We propose a method, based on the intelligent driver model, that allows us to accurately model individual driver behaviors from only a small number of frames using easily observable features. On average, this method makes prediction errors that ha… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

  5. arXiv:2205.06331  [pdf, other

    cs.LG cs.MA

    Collaborative Multi-agent Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, Mahnoosh Alizadeh

    Abstract: We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Journal ref: American Control Conference (ACC), 2022

  6. arXiv:2205.06326  [pdf, other

    cs.LG

    Multi-Environment Meta-Learning in Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Mohammad Ghavamzadeh, Taha Rajabzadeh, Christos Thrampoulidis, Mahnoosh Alizadeh

    Abstract: In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when tas… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Journal ref: IEEE International Symposium on Information Theory (ISIT), 2022

  7. arXiv:2106.05378  [pdf, other

    cs.LG

    Feature and Parameter Selection in Stochastic Linear Bandits

    Authors: Ahmadreza Moradipari, Berkay Turan, Yasin Abbasi-Yadkori, Mahnoosh Alizadeh, Mohammad Ghavamzadeh

    Abstract: We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in… ▽ More

    Submitted 17 June, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Machine Learning, 2022

  8. arXiv:2010.00081  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Stage-wise Conservative Linear Bandits

    Authors: Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

    Abstract: We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the learner must choose actions that not only maximize cumulative reward across the entire time horizon but further satisfy a linear baseline constraint that takes th… ▽ More

    Submitted 30 September, 2020; originally announced October 2020.

    Comments: 28 pages, 5 figures

    Journal ref: Thirty-fourth Conference on Neural Information Processing Systems, NeurIPS 2020

  9. arXiv:2001.10474  [pdf, other

    cs.LG cs.AI stat.ML

    Coagent Networks Revisited

    Authors: Modjtaba Shokrian Zini, Mohammad Pedramfar, Matthew Riemer, Ahmadreza Moradipari, Miao Liu

    Abstract: Coagent networks formalize the concept of arbitrary networks of stochastic agents that collaborate to take actions in a reinforcement learning environment. Prominent examples of coagent networks in action include approaches to hierarchical reinforcement learning (HRL), such as those using options, which attempt to address the exploration exploitation trade-off by introducing abstract actions at di… ▽ More

    Submitted 29 August, 2023; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: Reformatted paper significantly and clarified results on the asynchronous case

  10. arXiv:1911.02156  [pdf, other

    cs.LG stat.ML

    Safe Linear Thompson Sampling with Side Information

    Authors: Ahmadreza Moradipari, Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis

    Abstract: The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this… ▽ More

    Submitted 29 February, 2020; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: Comparing with safe versions of linear UCB algorithms, Providing more intuition for proof sketch