Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Gottesman, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07333  [pdf, other

    cs.LG cs.AI stat.ML

    Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

    Authors: Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris

    Abstract: Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, wit… ▽ More

    Submitted 21 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: GitHub URL: https://github.com/brownirl/lambda_discrepancy; Videos: https://lambda-discrepancy.github.io/

  2. arXiv:2306.17750  [pdf, other

    cs.LG

    TD Convergence: An Optimization Perspective

    Authors: Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor

    Abstract: We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two f… ▽ More

    Submitted 8 November, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted at Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  3. arXiv:2304.03365  [pdf, other

    cs.LG cs.AI

    Decision-Focused Model-based Reinforcement Learning for Reward Transfer

    Authors: Abhishek Sharma, Sonali Parbhoo, Omer Gottesman, Finale Doshi-Velez

    Abstract: Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm that can focus on learning the MDP dynamics that are most relevant for obtaining high returns. While this approach increases the agent's performance by directly optimizing the reward, it does so by learning less accurate dynamics from a maximum likelihood perspective. We demonstrate that w… ▽ More

    Submitted 1 January, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

  4. arXiv:2301.00009   

    cs.LG cs.AI

    On the Geometry of Reinforcement Learning in Continuous State and Action Spaces

    Authors: Saket Tiwari, Omer Gottesman, George Konidaris

    Abstract: Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose building a theoretical understanding of continuous state and action spaces by employing a geometric lens. Central to our work is the idea that the transiti… ▽ More

    Submitted 10 August, 2024; v1 submitted 29 December, 2022; originally announced January 2023.

    Comments: We found an issue with one of the proofs. We are working on rectifying it

  5. arXiv:2208.00250  [pdf, other

    cs.LG cs.AI

    A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

    Authors: Kelly W. Zhang, Omer Gottesman, Finale Doshi-Velez

    Abstract: In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with domain expertise, it is often difficult to know whether it is appropriate to treat a sequential decision making problem as a CB or an MDP. In other word… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: Challenges of Real-World Reinforcement Learning 2020 (NeurIPS Workshop)

  6. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  7. arXiv:2111.14272  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Subgroups With Similar Benefits in Off-Policy Policy Evaluation

    Authors: Ramtin Keramati, Omer Gottesman, Leo Anthony Celi, Finale Doshi-Velez, Emma Brunskill

    Abstract: Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some individuals but not others. This has motivated a push towards personalization and accurate per-state estimates of heterogeneous treatment effects (HTEs).… ▽ More

    Submitted 28 November, 2021; originally announced November 2021.

  8. arXiv:2110.12276  [pdf, other

    cs.LG

    Coarse-Grained Smoothness for RL in Metric Spaces

    Authors: Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman

    Abstract: Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  9. arXiv:2109.06310  [pdf, other

    cs.LG stat.ML

    State Relevance for Off-Policy Evaluation

    Authors: Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

    Abstract: Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces varianc… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICML 2021

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021

  10. arXiv:2106.04379  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Markov State Abstractions for Deep Reinforcement Learning

    Authors: Cameron Allen, Neev Parikh, Omer Gottesman, George Konidaris

    Abstract: A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are suff… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Fixed typo (see Errata). Code available at https://github.com/camall3n/markov-state-abstractions

  11. arXiv:2007.00973  [pdf, other

    cs.LG stat.ML

    Learning to search efficiently for causally near-optimal treatments

    Authors: Samuel Håkansson, Viktor Lindblom, Omer Gottesman, Fredrik D. Johansson

    Abstract: Finding an effective medical treatment often requires a search by trial and error. Making this search more efficient by minimizing the number of unnecessary trials could lower both costs and patient suffering. We formalize this problem as learning a policy for finding a near-optimal treatment in a minimum number of trials using a causal inference framework. We give a model-based dynamic programmin… ▽ More

    Submitted 17 February, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

  12. arXiv:2002.03478  [pdf, other

    cs.LG stat.ML

    Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

    Authors: Omer Gottesman, Joseph Futoma, Yao Liu, Sonali Parbhoo, Leo Anthony Celi, Emma Brunskill, Finale Doshi-Velez

    Abstract: Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method t… ▽ More

    Submitted 11 August, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: ICML final version

  13. arXiv:1905.10424  [pdf, ps, other

    stat.ML cs.LG

    A general method for regularizing tensor decomposition methods via pseudo-data

    Authors: Omer Gottesman, Weiwei Pan, Finale Doshi-Velez

    Abstract: Tensor decomposition methods allow us to learn the parameters of latent variable models through decomposition of low-order moments of data. A significant limitation of these algorithms is that there exists no general method to regularize them, and in the past regularization has mostly been performed using bespoke modifications to the algorithms, tailored for the particular form of the desired regu… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

  14. arXiv:1905.05787  [pdf, ps, other

    cs.LG stat.ML

    Combining Parametric and Nonparametric Models for Off-Policy Evaluation

    Authors: Omer Gottesman, Yao Liu, Scott Sussex, Emma Brunskill, Finale Doshi-Velez

    Abstract: We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at e… ▽ More

    Submitted 15 May, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Journal ref: PMLR 97:2366-2375, 2019

  15. arXiv:1901.04670  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning

    Authors: Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-wei H. Lehman, Andrew Ross, Aldo Faisal, Finale Doshi-Velez

    Abstract: Sepsis is the leading cause of mortality in the ICU. It is challenging to manage because individual patients respond differently to treatment. Thus, tailoring treatment to the individual patient is essential for the best outcomes. In this paper, we take steps toward this goal by applying a mixture-of-experts framework to personalize sepsis treatment. The mixture model selectively alternates betwee… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: AMIA 2018 Annual Symposium

  16. arXiv:1807.01066  [pdf, other

    cs.LG stat.ML

    Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

    Authors: Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill

    Abstract: In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of empirical studies, we demonstrate how accurate OPE is strongly dependent on the calibration of estimated behaviour policy models: how precisely the behaviour policy is estimated from data. We show how powerful parametric mod… ▽ More

    Submitted 10 July, 2018; v1 submitted 3 July, 2018; originally announced July 2018.

    Comments: Accepted to workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action at ICML 2018

  17. arXiv:1805.12298  [pdf, other

    cs.LG stat.ML

    Evaluating Reinforcement Learning Algorithms in Observational Health Settings

    Authors: Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng, Jiayu Yao, Isaac Lage, Christopher Mosch, Li-wei H. Lehman, Matthieu Komorowski, Matthieu Komorowski, Aldo Faisal, Leo Anthony Celi, David Sontag, Finale Doshi-Velez

    Abstract: Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare. Reinforcement learning (RL) is a sub-field within machine learning that is concerned with learning how to make sequences of decisions so as to optimize long-term effects. Already, RL algorithms have been proposed to identify decision-making strateg… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

  18. arXiv:1805.09044  [pdf, other

    cs.LG cs.AI stat.ML

    Representation Balancing MDPs for Off-Policy Policy Evaluation

    Authors: Yao Liu, Omer Gottesman, Aniruddh Raghu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill

    Abstract: We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a l… ▽ More

    Submitted 17 April, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: appeared at NeurIPS 18; updated style file