Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Hanna, J P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17168  [pdf, other

    cs.LG cs.AI cs.RO

    Reinforcement Learning via Auxiliary Task Distillation

    Authors: Abhinav Narayan Harish, Larry Heck, Josiah P. Hanna, Zsolt Kira, Andrew Szot

    Abstract: We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.05064  [pdf, other

    cs.LG

    Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

    Authors: Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, Robert Nowak

    Abstract: In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumulative regret for an unseen but related test task. We use a transformer as a decision-making algorithm to learn this shared structure so as to general… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.02165  [pdf, other

    cs.LG

    SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

    Authors: Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

    Abstract: In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will obtain. Policy evaluation requires data and we are interested in the question of what \textit{behavior} policy should collect the data for the most accu… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2405.07838  [pdf, other

    cs.LG cs.AI

    Adaptive Exploration for Data-Efficient General Value Function Evaluations

    Authors: Arushi Jain, Josiah P. Hanna, Doina Precup

    Abstract: General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 20 pages, 9 figures, Under Review

  5. arXiv:2311.08290  [pdf, other

    cs.LG

    On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

    Authors: Nicholas E. Corrado, Josiah P. Hanna

    Abstract: On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the polic… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  6. arXiv:2311.00327  [pdf, other

    cs.LG

    Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

    Authors: Subhojyoti Mukherjee, Qiaomin Xie, Josiah P. Hanna, Robert Nowak

    Abstract: We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common l… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted in 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  7. arXiv:2310.18409  [pdf, other

    cs.LG

    State-Action Similarity-Based Representations for Off-Policy Evaluation

    Authors: Brahma S. Pavse, Josiah P. Hanna

    Abstract: In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted to Neural Information Processing Systems (NeurIPS) 2023

  8. arXiv:2310.18247  [pdf, other

    cs.LG cs.RO

    Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

    Authors: Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna

    Abstract: In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amounts of expert-quality data to learn effective policies that generalize to out-of-distribution states. Unfortunately, such data is often difficult and ex… ▽ More

    Submitted 8 August, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: RLC 2024

  9. arXiv:2310.17786  [pdf, other

    cs.LG

    Understanding when Dynamics-Invariant Data Augmentations Benefit Model-Free Reinforcement Learning Updates

    Authors: Nicholas E. Corrado, Josiah P. Hanna

    Abstract: Recently, data augmentation (DA) has emerged as a method for leveraging domain knowledge to inexpensively generate additional data in reinforcement learning (RL) tasks, often yielding substantial improvements in data efficiency. While prior work has demonstrated the utility of incorporating augmented data directly into model-free RL updates, it is not well-understood when a particular DA strategy… ▽ More

    Submitted 16 March, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  10. arXiv:2306.01896  [pdf, other

    cs.LG

    Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

    Authors: Brahma S. Pavse, Matthew Zurek, Yudong Chen, Qiaomin Xie, Josiah P. Hanna

    Abstract: In many reinforcement learning (RL) applications, we want policies that reach desired states and then keep the controlled system within an acceptable region around the desired states over an indefinite period of time. This latter objective is called stability and is especially important when the state space is unbounded, such that the states can be arbitrarily far from each other and the agent can… ▽ More

    Submitted 26 May, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2024

  11. arXiv:2305.14133  [pdf, other

    cs.LG

    Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

    Authors: Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht

    Abstract: Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world… ▽ More

    Submitted 12 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Conference on Neural Information Processing Systems (NeurIPS), 2023

  12. arXiv:2212.08302  [pdf, other

    cs.LG cs.AI

    Safe Evaluation For Offline Learning: Are We Ready To Deploy?

    Authors: Hager Radi, Josiah P. Hanna, Peter Stone, Matthew E. Taylor

    Abstract: The world currently offers an abundance of data in multiple domains, from which we can learn reinforcement learning (RL) policies without further interaction with the environment. RL agents learning offline from such data is possible but deploying them while learning might be dangerous in domains where safety is critical. Therefore, it is essential to find a way to estimate how a newly-learned age… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2021 Workshop on Deployable Decision Making in Embodied Systems [Spotlight]

  13. arXiv:2212.07486  [pdf, other

    cs.LG cs.AI

    Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction

    Authors: Brahma S. Pavse, Josiah P. Hanna

    Abstract: We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of an evaluation policy, $π_e$, using a fixed dataset, $\mathcal{D}$, collected by one or more policies that may be different from $π_e$. Current OPE algorithms may produce poor OPE estimates under policy distribution shift i.e., when the probability of a particular… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023

  14. arXiv:2209.09446  [pdf, other

    cs.LG cs.AI

    A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

    Authors: Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, Josiah P. Hanna

    Abstract: In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during trai… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

  15. arXiv:2207.05480  [pdf, other

    cs.LG

    Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

    Authors: Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht

    Abstract: Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representatio… ▽ More

    Submitted 27 February, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: International Conference on Learning Representations (ICLR), 2023

  16. arXiv:2205.14323  [pdf, other

    cs.DB cs.LG

    Multi-agent Databases via Independent Learning

    Authors: Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

    Abstract: Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the per… ▽ More

    Submitted 5 August, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

    Journal ref: AIDB@VLDB 2022 Proceedings of 4th International Workshop on Applied AI for Database Systems and Applications

  17. arXiv:2203.04510  [pdf, other

    cs.LG

    ReVar: Strengthening Policy Evaluation via Reduced Variance Sampling

    Authors: Subhojyoti Mukherjee, Josiah P. Hanna, Robert Nowak

    Abstract: This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection s… ▽ More

    Submitted 17 June, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted for the $38^{\text {th }}$ Conference on Uncertainty in Artificial Intelligence (UAI 2022)

  18. arXiv:2111.14552  [pdf, other

    cs.LG

    Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

    Authors: Rujie Zhong, Duohan Zhang, Lukas Schäfer, Stefano V. Albrecht, Josiah P. Hanna

    Abstract: Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to mat… ▽ More

    Submitted 10 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  19. arXiv:2108.02530  [pdf, other

    cs.RO

    Interpretable Goal Recognition in the Presence of Occluded Factors for Autonomous Vehicles

    Authors: Josiah P. Hanna, Arrasy Rahman, Elliot Fosong, Francisco Eiras, Mihai Dobre, John Redford, Subramanian Ramamoorthy, Stefano V. Albrecht

    Abstract: Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rat… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  20. arXiv:2107.08966  [pdf, other

    cs.LG cs.AI

    Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

    Authors: Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht

    Abstract: Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to lev… ▽ More

    Submitted 9 February, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: Published at the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2022

  21. arXiv:2008.01281  [pdf, other

    cs.RO

    Stochastic Grounded Action Transformation for Robot Learning in Simulation

    Authors: Siddharth Desai, Haresh Karnan, Josiah P. Hanna, Garrett Warnell, Peter Stone

    Abstract: Robot control policies learned in simulation do not often transfer well to the real world. Many existing solutions to this sim-to-real problem, such as the Grounded Action Transformation (GAT) algorithm, seek to correct for or ground these differences by matching the simulator to the real world. However, the efficacy of these approaches is limited if they do not explicitly account for stochasticit… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted at 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020

  22. arXiv:2008.01279  [pdf, other

    cs.RO

    Reinforced Grounded Action Transformation for Sim-to-Real Transfer

    Authors: Haresh Karnan, Siddharth Desai, Josiah P. Hanna, Garrett Warnell, Peter Stone

    Abstract: Robots can learn to do complex tasks in simulation, but often, learned behaviors fail to transfer well to the real world due to simulator imperfections (the reality gap). Some existing solutions to this sim-to-real problem, such as Grounded Action Transformation (GAT), use a small amount of real-world experience to minimize the reality gap by grounding the simulator. While very effective in certai… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: Accepted at International Conference on Intelligent Robots and Systems (IROS) 2020

  23. arXiv:2007.09327  [pdf, ps, other

    cs.CR cs.LG cs.MA

    Towards Quantum-Secure Authentication and Key Agreement via Abstract Multi-Agent Interaction

    Authors: Ibrahim H. Ahmed, Josiah P. Hanna, Elliot Fosong, Stefano V. Albrecht

    Abstract: Current methods for authentication and key agreement based on public-key cryptography are vulnerable to quantum computing. We propose a novel approach based on artificial intelligence research in which communicating parties are viewed as autonomous agents which interact repeatedly using their private decision models. Authentication and key agreement are decided based on the agents' observed behavi… ▽ More

    Submitted 9 July, 2021; v1 submitted 18 July, 2020; originally announced July 2020.

    Comments: Published at the 19th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS 2021)

  24. arXiv:1912.11023  [pdf, other

    cs.LG stat.ML

    Learning an Interpretable Traffic Signal Control Policy

    Authors: James Ault, Josiah P. Hanna, Guni Sharon

    Abstract: Signalized intersections are managed by controllers that assign right of way (green, yellow, and red lights) to non-conflicting directions. Optimizing the actuation policy of such controllers is expected to alleviate traffic congestion and its adverse impact. Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and reg… ▽ More

    Submitted 26 February, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

  25. arXiv:1906.07372  [pdf, other

    cs.LG cs.RO stat.ML

    RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration

    Authors: Brahma S. Pavse, Faraz Torabi, Josiah P. Hanna, Garrett Warnell, Peter Stone

    Abstract: Augmenting reinforcement learning with imitation learning is often hailed as a method by which to improve upon learning from scratch. However, most existing methods for integrating these two techniques are subject to several strong assumptions---chief among them that information about demonstrator actions is available. In this paper, we investigate the extent to which this assumption is necessary… ▽ More

    Submitted 21 July, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: IEEE Robotics and Automation Letters, presented at International Conference on Intelligent Robots and Systems (IROS 2020)

  26. arXiv:1806.01347  [pdf, other

    cs.LG cs.AI stat.ML

    Importance Sampling Policy Evaluation with an Estimated Behavior Policy

    Authors: Josiah P. Hanna, Scott Niekum, Peter Stone

    Abstract: We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a technique for off-policy evaluation that re-weights off-policy returns to account for differences in the likelihood of the returns between the two policies. In t… ▽ More

    Submitted 9 May, 2019; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: Accepted to ICML 2019

  27. arXiv:1706.03469  [pdf, other

    cs.AI

    Data-Efficient Policy Evaluation Through Behavior Policy Search

    Authors: Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum

    Abstract: We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. We d… ▽ More

    Submitted 12 June, 2017; originally announced June 2017.

    Comments: Accepted to ICML 2017; Extended version; 15 pages

  28. arXiv:1606.06126  [pdf, other

    cs.AI cs.LG stat.ML

    Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

    Authors: Josiah P. Hanna, Peter Stone, Scott Niekum

    Abstract: For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existin… ▽ More

    Submitted 24 September, 2018; v1 submitted 20 June, 2016; originally announced June 2016.

    Comments: Published in proceedings of the 16th International Conference on Autonomous Agents and Multi-agent Systems