Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: McIlraith, S A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00120  [pdf, other

    cs.LG cs.AI cs.FL

    Reward Machines for Deep RL in Noisy and Uncertain Environments

    Authors: Andrew C. Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While Reward Machines have been employed in both tabular and deep RL settings, they have typ… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

    ACM Class: I.2.0; I.2.6; I.2.4; F.4.3

  2. arXiv:2312.11675  [pdf, other

    cs.AI

    PRP Rebooted: Advancing the State of the Art in FOND Planning

    Authors: Christian Muise, Sheila A. McIlraith, J. Christopher Beck

    Abstract: Fully Observable Non-Deterministic (FOND) planning is a variant of classical symbolic planning in which actions are nondeterministic, with an action's outcome known only upon execution. It is a popular planning paradigm with applications ranging from robot planning to dialogue-agent design and reactive synthesis. Over the last 20 years, a number of approaches to FOND planning have emerged. In this… ▽ More

    Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 13 pages, 4 figures, AAAI conference paper Update: Fixed abstract and typos

    ACM Class: I.2.8

  3. arXiv:2312.04772  [pdf, other

    cs.AI cs.CY cs.LG

    Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

    Authors: Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. McIlraith

    Abstract: Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We fu… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  4. arXiv:2301.02952  [pdf, other

    cs.LG cs.AI

    Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

    Authors: Phillip J. K. Christoffersen, Andrew C. Li, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Many real-world reinforcement learning (RL) problems necessitate learning complex, temporally extended behavior that may only receive reward signal when the behavior is completed. If the reward-worthy behavior is known, it can be specified in terms of a non-Markovian reward function - a function that depends on aspects of the state-action history, rather than just the current state and action. Suc… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: 7 pages, 2 figures, presented at KR2ML workshop at NeurIPS 2020

  5. arXiv:2211.10902  [pdf, other

    cs.LG cs.AI cs.FL

    Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

    Authors: Andrew C. Li, Zizhao Chen, Pashootan Vaezipoor, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Rewa… ▽ More

    Submitted 23 November, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: NeurIPS Deep Reinforcement Learning Workshop 2022

  6. arXiv:2211.04591  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Follow Instructions in Text-Based Games

    Authors: Mathieu Tuli, Andrew C. Li, Pashootan Vaezipoor, Toryn Q. Klassen, Scott Sanner, Sheila A. McIlraith

    Abstract: Text-based games present a unique class of sequential decision making problem in which agents interact with a partially observable, simulated environment via actions and observations conveyed through natural language. Such observations typically include instructions that, in a reinforcement learning (RL) setting, can directly or indirectly guide a player towards completing reward-worthy tasks. In… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022

  7. arXiv:2206.01812  [pdf, other

    cs.LG cs.AI cs.RO

    Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

    Authors: Andrew C. Li, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Deep reinforcement learning has shown promise in discrete domains requiring complex reasoning, including games such as Chess, Go, and Hanabi. However, this type of reasoning is less often observed in long-horizon, continuous domains with high-dimensional observations, where instead RL research has predominantly focused on problems with simple high-level structure (e.g. opening a drawer or moving a… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  8. arXiv:2112.09477  [pdf, other

    cs.LG cs.AI

    Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

    Authors: Rodrigo Toro Icarte, Ethan Waldie, Toryn Q. Klassen, Richard Valenzano, Margarita P. Castro, Sheila A. McIlraith

    Abstract: Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

  9. arXiv:2106.02617  [pdf, other

    cs.AI cs.LG

    Be Considerate: Objectives, Side Effects, and Deciding How to Act

    Authors: Parand Alizadeh Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Recent work in AI safety has highlighted that in sequential decision making, objectives are often underspecified or incomplete. This gives discretion to the acting agent to realize the stated objective in ways that may result in undesirable outcomes. We contend that to learn to act safely, a reinforcement learning (RL) agent should include contemplation of the impact of its actions on the wellbein… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  10. arXiv:2106.00133  [pdf, other

    cs.AI

    AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

    Authors: Maayan Shvo, Zhiming Hu, Rodrigo Toro Icarte, Iqbal Mohomed, Allan Jepson, Sheila A. McIlraith

    Abstract: Human beings, even small children, quickly become adept at figuring out how to use applications on their mobile devices. Learning to use a new app is often achieved via trial-and-error, accelerated by transfer of knowledge from past experiences with like apps. The prospect of building a smarter smartphone - one that can learn how to achieve tasks using mobile apps - is tantalizing. In this paper w… ▽ More

    Submitted 6 June, 2021; v1 submitted 31 May, 2021; originally announced June 2021.

  11. arXiv:2012.02419  [pdf, other

    cs.LG cs.AI

    Planning from Pixels using Inverse Dynamics Models

    Authors: Keiran Paster, Sheila A. McIlraith, Jimmy Ba

    Abstract: Learning task-agnostic dynamics models in high-dimensional observation spaces can be challenging for model-based RL agents. We propose a novel way to learn latent world models by learning to predict sequences of future actions conditioned on task completion. These task-conditioned models adaptively focus modeling capacity on task-relevant dynamics, while simultaneously serving as an effective heur… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

    Comments: 9 pages, 4 figures

  12. Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

    Authors: Rodrigo Toro Icarte, Toryn Q. Klassen, Richard Valenzano, Sheila A. McIlraith

    Abstract: Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible -- to show the reward function's code to the RL… ▽ More

    Submitted 17 January, 2022; v1 submitted 5 October, 2020; originally announced October 2020.

    Journal ref: Journal of Artificial Intelligence Research 73 (2022) 173-208

  13. arXiv:2010.02819  [pdf, other

    cs.LG cs.AI

    Interpretable Sequence Classification via Discrete Optimization

    Authors: Maayan Shvo, Andrew C. Li, Rodrigo Toro Icarte, Sheila A. McIlraith

    Abstract: Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neura… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  14. arXiv:2010.01753  [pdf, other

    cs.LG cs.AI

    The act of remembering: a study in partially observable reinforcement learning

    Authors: Rodrigo Toro Icarte, Richard Valenzano, Toryn Q. Klassen, Phillip Christoffersen, Amir-massoud Farahmand, Sheila A. McIlraith

    Abstract: Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. Learning memoryless policies is efficient and optimal in fully observable environments. However, some form of memory is necessary when RL agents are faced with partial observability. In this paper, we study a lightweight approach to tackle partial observ… ▽ More

    Submitted 4 October, 2020; originally announced October 2020.

  15. arXiv:2005.02963  [pdf, ps, other

    cs.AI

    Towards the Role of Theory of Mind in Explanation

    Authors: Maayan Shvo, Toryn Q. Klassen, Sheila A. McIlraith

    Abstract: Theory of Mind is commonly defined as the ability to attribute mental states (e.g., beliefs, goals) to oneself, and to others. A large body of previous work - from the social sciences to artificial intelligence - has observed that Theory of Mind capabilities are central to providing an explanation to another agent or when explaining that agent's behaviour. In this paper, we build and expand upon p… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

  16. arXiv:1912.13430  [pdf, other

    cs.AI cs.FL cs.GT cs.LO cs.NE

    Towards Neural-Guided Program Synthesis for Linear Temporal Logic Specifications

    Authors: Alberto Camacho, Sheila A. McIlraith

    Abstract: Synthesizing a program that realizes a logical specification is a classical problem in computer science. We examine a particular type of program synthesis, where the objective is to synthesize a strategy that reacts to a potentially adversarial environment while ensuring that all executions satisfy a Linear Temporal Logic (LTL) specification. Unfortunately, exact methods to solve so-called LTL syn… ▽ More

    Submitted 31 December, 2019; originally announced December 2019.

  17. arXiv:1906.06436  [pdf, ps, other

    cs.AI

    Towards Empathetic Planning

    Authors: Maayan Shvo, Sheila A. McIlraith

    Abstract: Critical to successful human interaction is a capacity for empathy - the ability to understand and share the thoughts and feelings of another. As Artificial Intelligence (AI) systems are increasingly required to interact with humans in a myriad of settings, it is important to enable AI to wield empathy as a tool to benefit those it interacts with. In this paper, we work towards this goal by bringi… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  18. arXiv:1808.10831  [pdf, ps, other

    cs.LO cs.AI

    Finite LTL Synthesis with Environment Assumptions and Quality Measures

    Authors: Alberto Camacho, Meghyn Bienvenu, Sheila A. McIlraith

    Abstract: In this paper, we investigate the problem of synthesizing strategies for linear temporal logic (LTL) specifications that are interpreted over finite traces -- a problem that is central to the automated construction of controllers, robot programs, and business processes. We study a natural variant of the finite LTL synthesis problem in which strategy guarantees are predicated on specified environme… ▽ More

    Submitted 31 August, 2018; originally announced August 2018.

    Comments: 14 pages. To appear in the Proceedings of the 16th International Conference on Principles of Knowledge Representation and Reasoning (KR 2018) without the appendix proofs. The body of this paper is the same as the KR 2018 paper except that a minor typographic error has been corrected, as noted in this paper

  19. arXiv:1609.04371   

    cs.LO cs.AI

    Finite LTL Synthesis is EXPTIME-complete

    Authors: Jorge A. Baier, Alberto Camacho, Christian Muise, Sheila A. McIlraith

    Abstract: LTL synthesis -- the construction of a function to satisfy a logical specification formulated in Linear Temporal Logic -- is a 2EXPTIME-complete problem with relevant applications in controller synthesis and a myriad of artificial intelligence applications. In this research note we consider De Giacomo and Vardi's variant of the synthesis problem for LTL formulas interpreted over finite rather than… ▽ More

    Submitted 17 November, 2016; v1 submitted 14 September, 2016; originally announced September 2016.

    Comments: We withdraw this paper because of an error in the proof

  20. arXiv:0909.0682  [pdf, ps, other

    cs.AI

    On Planning with Preferences in HTN

    Authors: Shirin Sohrabi, Sheila A. McIlraith

    Abstract: In this paper, we address the problem of generating preferred plans by combining the procedural control knowledge specified by Hierarchical Task Networks (HTNs) with rich qualitative user preferences. The outcome of our work is a language for specifyin user preferences, tailored to HTN planning, together with a provably optimal preference-based planner, HTNPLAN, that is implemented as an extensi… ▽ More

    Submitted 3 September, 2009; originally announced September 2009.

    Comments: This paper appears in Twelfth International Workshop on Non-Monotonic Reasoning (NMR08). An earlier version of this paper appears in Fourth Multidisciplinary Workshop on Advances in Preference Handling (M-Pref08) at AAAI-08