Zum Hauptinhalt springen

Showing 1–50 of 207 results for author: Mannor, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11876  [pdf

    q-bio.QM cs.AI cs.LG

    From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis

    Authors: Guy Lutsker, Gal Sapir, Anastasia Godneva, Smadar Shilo, Jerry R Greenfield, Dorit Samocha-Bonet, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, Eran Segal

    Abstract: Recent advances in self-supervised learning enabled novel medical AI models, known as foundation models (FMs) that offer great potential for characterizing health from diverse biomedical data. Continuous glucose monitoring (CGM) provides rich, temporal data on glycemic patterns, but its full potential for predicting broader health outcomes remains underutilized. Here, we present GluFormer, a gener… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2406.18237  [pdf, other

    cs.AI cs.GR cs.RO

    PlaMo: Plan and Move in Rich 3D Physical Environments

    Authors: Assaf Hallak, Gal Dalal, Chen Tessler, Kelly Guo, Shie Mannor, Gal Chechik

    Abstract: Controlling humanoids in complex physically simulated worlds is a long-standing challenge with numerous applications in gaming, simulation, and visual content creation. In our setup, given a rich and complex 3D scene, the user provides a list of instructions composed of target locations and locomotion types. To solve this task we present PlaMo, a scene-aware path planner and a robust physics-based… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.01389  [pdf, other

    cs.LG cs.AI eess.SY

    RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

    Authors: Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni

    Abstract: In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMD… ▽ More

    Submitted 26 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Fixed typos + alpha

  4. arXiv:2405.16581  [pdf, other

    cs.LG

    On Bits and Bandits: Quantifying the Regret-Information Trade-off

    Authors: Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor

    Abstract: In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources. We examine the trade-off between the information an agent accumulates and the regret it suffers. We show that information from external sources, measured in bits, can be traded off for regret, measured in reward. We invoke informati… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2404.05440  [pdf, other

    cs.AI cs.LG

    Tree Search-Based Policy Optimization under Stochastic Execution Delay

    Authors: David Valensi, Esther Derman, Shie Mannor, Gal Dalal

    Abstract: The standard formulation of Markov decision processes (MDPs) assumes that the agent's decisions are executed immediately. However, in numerous realistic applications such as robotics or healthcare, actions are performed with a delay whose value can even be stochastic. In this work, we introduce stochastic delayed execution MDPs, a new formalism addressing random delays without resorting to state a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Published in ICLR 2024

  6. arXiv:2403.06806  [pdf, other

    cs.LG eess.SY

    On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes

    Authors: Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y. Levy, R. Srikant, Shie Mannor

    Abstract: We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action spaces. Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $O\left({\frac{1}{T}}\right),$ which translat… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 29 pages, 5 figures

  7. arXiv:2403.05732   

    cs.AI cs.LG

    Conservative DDPG -- Pessimistic RL without Ensemble

    Authors: Nitsan Soffair, Shie Mannor

    Abstract: DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values. Traditional solutions to this bias involve ensemble-based methods, which require significant computational resources, or complex log-policy-based approaches, which are difficult to understand and implement. In contrast, we propose a straightforward solution using a $Q$-target and… ▽ More

    Submitted 2 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Paper do not ready

  8. arXiv:2402.10342  [pdf, other

    cs.LG cs.AI cs.CL

    Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

    Authors: Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has achieved impressive empirical successes while relying on a small amount of human feedback. However, there is limited theoretical justification for this phenomenon. Additionally, most recent studies focus on value-based algorithms despite the recent empirical successes of policy-based algorithms. In this work, we consider an RLHF algorithm based… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  9. arXiv:2402.05951   

    cs.LG cs.AI

    MinMaxMin $Q$-learning

    Authors: Nitsan Soffair, Shie Mannor

    Abstract: MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to the $Q$-target and used as the priority experi… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: Paper do not ready

  10. arXiv:2402.05950  [pdf, other

    cs.LG cs.AI

    SQT -- std $Q$-target

    Authors: Nitsan Soffair, Dotan Di-Castro, Orly Avner, Shie Mannor

    Abstract: Std $Q$-target is a conservative, actor-critic, ensemble, $Q$-learning-based algorithm, which is based on a single key $Q$-formula: $Q$-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the problem of overestimation bias. We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3… ▽ More

    Submitted 2 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  11. arXiv:2402.05643  [pdf, other

    cs.LG cs.AI

    Improving Token-Based World Models with Parallel Observation Prediction

    Authors: Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor

    Abstract: Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observa… ▽ More

    Submitted 29 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  12. arXiv:2310.07596  [pdf, other

    cs.LG cs.IT

    Prospective Side Information for Latent MDPs

    Authors: Jeongyeol Kwon, Yonathan Efroni, Shie Mannor, Constantine Caramanis

    Abstract: In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's preferences, is not given. In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction. This t… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  13. arXiv:2310.00675  [pdf, other

    cs.LG eess.SP

    Optimization or Architecture: How to Hack Kalman Filtering

    Authors: Ido Greenberg, Netanel Yannay, Shie Mannor

    Abstract: In non-linear filtering, it is traditional to compare non-linear architectures such as neural networks to the standard linear Kalman Filter (KF). We observe that this mixes the evaluation of two separate components: the non-linear architecture, and the parameters optimization method. In particular, the non-linear model is often optimized, whereas the reference KF model is not. We argue that both s… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  14. arXiv:2309.01107  [pdf, other

    cs.LG

    Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

    Authors: Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

    Abstract: In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state… ▽ More

    Submitted 12 February, 2024; v1 submitted 3 September, 2023; originally announced September 2023.

    Comments: accepted in AAAI2024

  15. arXiv:2307.13763  [pdf, other

    stat.ML cs.AI cs.LG

    Sobolev Space Regularised Pre Density Models

    Authors: Mark Kozdoba, Binyamin Perets, Shie Mannor

    Abstract: We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density. This method is statistically consistent, and makes the inductive bias of the model clear and interpretable. While there is no closed analytic form for the associated kernel, we show that one can approximate it using sampling. The optimization problem needed to determine the d… ▽ More

    Submitted 13 February, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

  16. arXiv:2306.14020  [pdf, other

    cs.LG

    Individualized Dosing Dynamics via Neural Eigen Decomposition

    Authors: Stav Belogolovsky, Ido Greenberg, Danny Eytan, Shie Mannor

    Abstract: Dosing models often use differential equations to model biological dynamics. Neural differential equations in particular can learn to predict the derivative of a process, which permits predictions at irregular points of time. However, this temporal flexibility often comes with a high sensitivity to noise, whereas medical problems often present high noise and limited data. Moreover, medical dosing… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2202.00117

  17. arXiv:2306.05859  [pdf, other

    cs.LG

    Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

    Authors: Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

    Abstract: Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn r… ▽ More

    Submitted 12 February, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

  18. arXiv:2305.19922  [pdf, other

    cs.LG cs.AI

    Representation-Driven Reinforcement Learning

    Authors: Ofir Nabati, Guy Tennenholtz, Shie Mannor

    Abstract: We present a representation-driven framework for reinforcement learning. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation problem as a representation-exploitation problem, where go… ▽ More

    Submitted 17 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted to ICML 2023

  19. arXiv:2305.02195  [pdf, other

    cs.CV cs.AI cs.RO

    CALM: Conditional Adversarial Latent Models for Directable Virtual Characters

    Authors: Chen Tessler, Yoni Kasten, Yunrong Guo, Shie Mannor, Gal Chechik, Xue Bin Peng

    Abstract: In this work, we present Conditional Adversarial Latent Models (CALM), an approach for generating diverse and directable behaviors for user-controlled interactive virtual characters. Using imitation learning, CALM learns a representation of movement that captures the complexity and diversity of human motion, and enables direct control over character movements. The approach jointly learns a control… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted to SIGGRAPH 2023

  20. arXiv:2303.06654  [pdf, other

    cs.LG cs.AI

    Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization

    Authors: Esther Derman, Yevgeniy Men, Matthieu Geist, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet,… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Extended version of NeuIPS paper: arXiv:2110.06267

  21. arXiv:2301.13642  [pdf, other

    cs.LG math.OC

    An Efficient Solution to s-Rectangular Robust Markov Decision Processes

    Authors: Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

    Abstract: We present an efficient robust value iteration for \texttt{s}-rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (non-robust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, whic… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.14327

  22. arXiv:2301.13589  [pdf, ps, other

    cs.LG cs.AI

    Policy Gradient for Rectangular Robust Markov Decision Processes

    Authors: Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Levy, Shie Mannor

    Abstract: Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive. In this paper, we introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (… ▽ More

    Submitted 10 December, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to NeurIPS 2023

  23. arXiv:2301.13236  [pdf, other

    cs.LG cs.AI

    SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search

    Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik

    Abstract: Despite the popularity of policy gradient methods, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax -- a generalization of softmax that takes planning into account. In SoftTreeMax, we extend the traditional logits with the multi-step discounted cumulative reward, topped with the logits of future states. We consider two variants of… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: text overlap with arXiv:2209.13966

  24. arXiv:2301.11147  [pdf, other

    cs.LG

    Train Hard, Fight Easy: Robust Meta Reinforcement Learning

    Authors: Ido Greenberg, Shie Mannor, Gal Chechik, Eli Meirom

    Abstract: A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Meta-RL (MRL) addresses this issue by learning a meta-policy that adapts to new tasks. Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty. This limits system reliability since test tasks… ▽ More

    Submitted 1 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: NeurIPS 2023

  25. arXiv:2301.01320  [pdf, ps, other

    cs.LG stat.ML

    Towards Deployable RL -- What's Broken with RL Research and a Potential Fix

    Authors: Shie Mannor, Aviv Tamar

    Abstract: Reinforcement learning (RL) has demonstrated great potential, but is currently full of overhyping and pipe dreams. We point to some difficulties with current research which we feel are endemic to the direction taken by the community. To us, the current direction is not likely to lead to "deployable" RL: RL that works in practice and can work in practical situations yet still is economically viable… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  26. arXiv:2212.06437  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiffStack: A Differentiable and Modular Control Stack for Autonomous Vehicles

    Authors: Peter Karkus, Boris Ivanovic, Shie Mannor, Marco Pavone

    Abstract: Autonomous vehicle (AV) stacks are typically built in a modular fashion, with explicit components performing detection, tracking, prediction, planning, control, etc. While modularity improves reusability, interpretability, and generalizability, it also suffers from compounding errors, information bottlenecks, and integration challenges. To overcome these challenges, a prominent approach is to conv… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: CoRL 2022 camera ready

  27. arXiv:2210.03528  [pdf, other

    cs.LG cs.IT stat.ML

    Tractable Optimality in Episodic Latent MABs

    Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

    Abstract: We consider a multi-armed bandit problem with $M$ latent contexts, where an agent interacts with the environment for an episode of $H$ time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging. Without any additional structural… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  28. arXiv:2210.02594  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Reward-Mixing MDPs with a Few Latent Contexts are Learnable

    Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

    Abstract: We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among $M$ candidates and an agent interacts with the MDP throughout the episode for $H$ time steps. Our goal is to learn a near-optimal policy that nearly maximizes the $H$ time-step cumulative rewards in such a model. Previo… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  29. arXiv:2210.00991  [pdf, ps, other

    cs.LG

    Policy Gradient for Reinforcement Learning with General Utilities

    Authors: Navdeep Kumar, Kaixin Wang, Kfir Levy, Shie Mannor

    Abstract: In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. This objective may also be viewed as finding a policy that optimizes a linear function of its state-action occupancy measure, hereafter referred as Linear RL. However, many supervised and unsupervised RL problems are not covered in the Linear RL framework, such as app… ▽ More

    Submitted 29 August, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

  30. arXiv:2209.13966  [pdf, other

    cs.LG

    SoftTreeMax: Policy Gradient with Tree Search

    Authors: Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik

    Abstract: Policy-gradient methods are widely used for learning control policies. They can be easily distributed to multiple workers and reach state-of-the-art results in many domains. Unfortunately, they exhibit large variance and subsequently suffer from high-sample complexity since they aggregate gradients over entire trajectories. At the other extreme, planning methods, like tree search, optimize the pol… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  31. arXiv:2207.09090  [pdf, other

    cs.LG cs.AI eess.SY

    Actor-Critic based Improper Reinforcement Learning

    Authors: Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

    Abstract: We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.08201

  32. Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs

    Authors: Benjamin Fuhrer, Yuval Shpigelman, Chen Tessler, Shie Mannor, Gal Chechik, Eitan Zahavi, Gal Dalal

    Abstract: As communication protocols evolve, datacenter network utilization increases. As a result, congestion is more frequent, causing higher latency and packet loss. Combined with the increasing complexity of workloads, manual design of congestion control (CC) algorithms becomes extremely difficult. This calls for the development of AI approaches to replace the human effort. Unfortunately, it is currentl… ▽ More

    Submitted 1 June, 2024; v1 submitted 5 July, 2022; originally announced July 2022.

  33. arXiv:2206.12848  [pdf, ps, other

    cs.LG

    Analysis of Stochastic Processes through Replay Buffers

    Authors: Shirli Di Castro Shashua, Shie Mannor, Dotan Di-Castro

    Abstract: Replay buffers are a key component in many reinforcement learning schemes. Yet, their theoretical properties are not fully understood. In this paper we analyze a system where a stochastic process X is pushed into a replay buffer and then randomly sampled to generate a stochastic process Y from the replay buffer. We provide an analysis of the properties of the sampled process such as stationarity,… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.00445

  34. arXiv:2205.15376  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with a Terminator

    Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

    Abstract: We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We lea… ▽ More

    Submitted 5 October, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  35. arXiv:2205.14327  [pdf, other

    cs.AI

    Efficient Policy Iteration for Robust Markov Decision Processes via Regularization

    Authors: Navdeep Kumar, Kfir Levy, Kaixin Wang, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) provide a general framework to model decision problems where the system dynamics are changing or only partially known. Efficient methods for some \texttt{sa}-rectangular robust MDPs exist, using its equivalence with reward regularized MDPs, generalizable to online settings. In comparison to \texttt{sa}-rectangular robust MDPs, \texttt{s}-rectangular robust M… ▽ More

    Submitted 5 October, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

  36. arXiv:2205.05138  [pdf, other

    cs.LG

    Efficient Risk-Averse Reinforcement Learning

    Authors: Ido Greenberg, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

    Abstract: In risk-averse reinforcement learning (RL), the goal is to optimize some risk measure of the returns. A risk measure often focuses on the worst returns out of the agent's experience. As a result, standard methods for risk-averse RL often ignore high-return strategies. We prove that under certain conditions this inevitably leads to a local-optimum barrier, and propose a soft risk mechanism to bypas… ▽ More

    Submitted 12 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to NeurIPS 2022

  37. arXiv:2204.09052  [pdf, other

    quant-ph cs.LG

    Optimizing Tensor Network Contraction Using Reinforcement Learning

    Authors: Eli A. Meirom, Haggai Maron, Shie Mannor, Gal Chechik

    Abstract: Quantum Computing (QC) stands to revolutionize computing, but is currently still limited. To develop and test quantum algorithms today, quantum circuits are often simulated on classical computers. Simulating a complex quantum circuit requires computing the contraction of a large network of tensors. The order (path) of contraction can have a drastic effect on the computing cost, but finding an effi… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  38. arXiv:2203.06527  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Hidden Markov Models When the Locations of Missing Observations are Unknown

    Authors: Binyamin Perets, Mark Kozdoba, Shie Mannor

    Abstract: The Hidden Markov Model (HMM) is one of the most widely used statistical models for sequential data analysis. One of the key reasons for this versatility is the ability of HMM to deal with missing data. However, standard HMM learning algorithms rely crucially on the assumption that the positions of the missing observations \emph{within the observation sequence} are known. In the natural sciences,… ▽ More

    Submitted 2 July, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

    Comments: 9 pages

  39. arXiv:2202.01108  [pdf, other

    cs.AI

    Learning to reason about and to act on physical cascading events

    Authors: Yuval Atzmon, Eli A. Meirom, Shie Mannor, Gal Chechik

    Abstract: Reasoning and interacting with dynamic environments is a fundamental problem in AI, but it becomes extremely challenging when actions can trigger cascades of cross-dependent events. We introduce a new supervised learning setup called {\em Cascade} where an agent is shown a video of a physically simulated dynamic scene, and is asked to intervene and trigger a cascade of events, such that the system… ▽ More

    Submitted 23 July, 2023; v1 submitted 2 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, 2023

  40. arXiv:2202.00117  [pdf, other

    cs.LG eess.SY

    Continuous Forecasting via Neural Eigen Decomposition

    Authors: Stav Belogolovsky, Ido Greenberg, Danny Eitan, Shie Mannor

    Abstract: Neural differential equations predict the derivative of a stochastic process. This allows irregular forecasting with arbitrary time-steps. However, the expressive temporal flexibility often comes with a high sensitivity to noise. In addition, current methods model measurements and control together, limiting generalization to different control policies. These properties severely limit applicability… ▽ More

    Submitted 4 February, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

  41. arXiv:2201.12929  [pdf, other

    cs.LG

    The Geometry of Robust Value Functions

    Authors: Kaixin Wang, Navdeep Kumar, Kuangqi Zhou, Bryan Hooi, Jiashi Feng, Shie Mannor

    Abstract: The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov Decision Processes (MDPs). In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncert… ▽ More

    Submitted 11 August, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

  42. arXiv:2201.12700  [pdf, other

    cs.LG cs.CR cs.IT stat.ML

    Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

    Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

    Abstract: Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction $α< 1/2$ of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with $S$ contexts and $A$ actions (items). Naturally, whether a user is good or adversarial is not known in a… ▽ More

    Submitted 29 January, 2022; originally announced January 2022.

  43. arXiv:2201.12403  [pdf, other

    cs.LG cs.AI

    Planning and Learning with Adaptive Lookahead

    Authors: Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

    Abstract: Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate… ▽ More

    Submitted 18 January, 2023; v1 submitted 28 January, 2022; originally announced January 2022.

  44. arXiv:2110.06539  [pdf, other

    cs.LG cs.AI cs.RO

    On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

    Authors: Guy Tennenholtz, Assaf Hallak, Gal Dalal, Shie Mannor, Gal Chechik, Uri Shalit

    Abstract: We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning. We begin by defining the problem of learning from confounded expert data in a contextual MDP setup. We analyze the limitations of learning from such data with and without external reward, and propose an adjustment of standard imitation learning algorithms to fit this setup. We then di… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  45. arXiv:2110.06267  [pdf, other

    cs.LG math.OC

    Twice regularized MDPs and the equivalence between robustness and regularization

    Authors: Esther Derman, Matthieu Geist, Shie Mannor

    Abstract: Robust Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. To solve them, one typically resorts to robust optimization methods. However, this significantly increases computational complexity and limits scalability in both learning and planning. On the other hand, regularized MDPs show more stability in policy learning without impairing time complexity. Yet,… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2021

  46. arXiv:2110.05724  [pdf, other

    cs.LG

    Query-Reward Tradeoffs in Multi-Armed Bandits

    Authors: Nadav Merlis, Yonathan Efroni, Shie Mannor

    Abstract: We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we prove that there is a fundamental difference between problems with a unique and multiple optimal arms, unlike in the standard multi-armed bandit problem. We also… ▽ More

    Submitted 27 October, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

  47. arXiv:2110.03743  [pdf, ps, other

    cs.LG cs.AI cs.IT

    Reinforcement Learning in Reward-Mixing MDPs

    Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

    Abstract: Learning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP). There, a reward function is drawn from one of multiple possible reward models at the beginning of every episode, but the identity of the chosen reward model is… ▽ More

    Submitted 31 January, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021; fixed typo

  48. arXiv:2110.01954  [pdf, other

    cs.RO cs.LG

    Continuous-Time Fitted Value Iteration for Robust Policies

    Authors: Michael Lutter, Boris Belousov, Shie Mannor, Dieter Fox, Animesh Garg, Jan Peters

    Abstract: Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includ… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: text overlap with arXiv:2105.12189

  49. arXiv:2110.00445  [pdf, ps, other

    stat.ML cs.LG

    Sim and Real: Better Together

    Authors: Shirli Di Castro Shashua, Dotan Di Castro, Shie Mannor

    Abstract: Simulation is used extensively in autonomous systems, particularly in robotic manipulation. By far, the most common approach is to train a controller in simulation, and then use it as an initial starting point for the real system. We demonstrate how to learn simultaneously from both simulation and interaction with the real environment. We propose an algorithm for balancing the large number of samp… ▽ More

    Submitted 5 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

  50. arXiv:2109.10632  [pdf, other

    cs.AI cs.LG cs.MA eess.SY stat.ML

    Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning

    Authors: Roy Zohar, Shie Mannor, Guy Tennenholtz

    Abstract: Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. As environments grow in size, effective credit assignment becomes increasingly harder and often results in infeasible learning times. Still, in many real-world settings, there exist simplified underlying dynamics that can be… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.