-
Hyperbolic Deep Reinforcement Learning
Authors:
Edoardo Cetin,
Benjamin Chamberlain,
Michael Bronstein,
Jonathan J Hunt
Abstract:
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry p…
▽ More
We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Authors:
Zhendong Wang,
Jonathan J Hunt,
Mingyuan Zhou
Abstract:
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by poli…
▽ More
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collected static dataset, is an important paradigm of RL. Standard RL methods often perform poorly in this regime due to the function approximation errors on out-of-distribution actions. While a variety of regularization methods have been proposed to mitigate this issue, they are often constrained by policy classes with limited expressiveness that can lead to highly suboptimal solutions. In this paper, we propose representing the policy as a diffusion model, a recent class of highly-expressive deep generative models. We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy. In our approach, we learn an action-value function and we add a term maximizing action-values into the training loss of the conditional diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy. We show the expressiveness of the diffusion model-based policy, and the coupling of the behavior cloning and policy improvement under the diffusion model both contribute to the outstanding performance of Diffusion-QL. We illustrate the superiority of our method compared to prior works in a simple 2D bandit example with a multimodal behavior policy. We then show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
△ Less
Submitted 25 August, 2023; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Should I send this notification? Optimizing push notifications decision making by modeling the future
Authors:
Conor O'Brien,
Huasen Wu,
Shaodan Zhai,
Dalin Guo,
Wenzhe Shi,
Jonathan J Hunt
Abstract:
Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may…
▽ More
Most recommender systems are myopic, that is they optimize based on the immediate response of the user. This may be misaligned with the true objective, such as creating long term user satisfaction. In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong. For example, sending too many or irrelevant notifications may annoy a user and cause them to disable notifications. However, a myopic system will always choose to send a notification since negative effects occur in the future. This is typically mitigated using heuristics. However, heuristics can be hard to reason about or improve, require retuning each time the system is changed, and may be suboptimal. To counter these drawbacks, there is significant interest in recommender systems that optimize directly for long-term value (LTV). Here, we describe a method for maximising LTV by using model-based reinforcement learning (RL) to make decisions about whether to send push notifications. We model the effects of sending a notification on the user's future behavior. Much of the prior work applying RL to maximise LTV in recommender systems has focused on session-based optimization, while the time horizon for notification decision making in this work extends over several days. We test this approach in an A/B test on a major social network. We show that by optimizing decisions about push notifications we are able to send less notifications and obtain a higher open rate than the baseline system, while generating the same level of user engagement on the platform as the existing, heuristic-based, system.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Challenges and approaches to privacy preserving post-click conversion prediction
Authors:
Conor O'Brien,
Arvind Thiagarajan,
Sourav Das,
Rafael Barreto,
Chetan Verma,
Tim Hsu,
James Neufield,
Jonathan J Hunt
Abstract:
Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are of…
▽ More
Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Learning to Rank For Push Notifications Using Pairwise Expected Regret
Authors:
Yuguang Yue,
Yuanpu Xie,
Huasen Wu,
Haofeng Jia,
Shaodan Zhai,
Wenzhe Shi,
Jonathan J Hunt
Abstract:
Listwise ranking losses have been widely studied in recommender systems. However, new paradigms of content consumption present new challenges for ranking methods. In this work we contribute an analysis of learning to rank for personalized mobile push notifications and discuss the unique challenges this presents compared to traditional ranking problems. To address these challenges, we introduce a n…
▽ More
Listwise ranking losses have been widely studied in recommender systems. However, new paradigms of content consumption present new challenges for ranking methods. In this work we contribute an analysis of learning to rank for personalized mobile push notifications and discuss the unique challenges this presents compared to traditional ranking problems. To address these challenges, we introduce a novel ranking loss based on weighting the pairwise loss between candidates by the expected regret incurred for misordering the pair. We demonstrate that the proposed method can outperform prior methods both in a simulated environment and in a production experiment on a major social network.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Adaptive Modeling Powers Fast Multi-parameter Fitting of CARS Spectra
Authors:
Gregory J. Hunt,
Cody R. Ground,
Andrew D. Cutler
Abstract:
Coherent anti-Stokes Raman Spectroscopy (CARS) is a laser-based measurement technique widely applied across many science and engineering disciplines to perform non-intrusive gas diagnostics. CARS is often used to study combustion, where the measured spectra can be used to simultaneously recover multiple flow parameters from the reacting gas such as temperature and relative species mole fractions.…
▽ More
Coherent anti-Stokes Raman Spectroscopy (CARS) is a laser-based measurement technique widely applied across many science and engineering disciplines to perform non-intrusive gas diagnostics. CARS is often used to study combustion, where the measured spectra can be used to simultaneously recover multiple flow parameters from the reacting gas such as temperature and relative species mole fractions. This is typically done by using numerical optimization to find the flow parameters for which a theoretical model of the CARS spectra best matches the actual measurements. The most commonly used theoretical model is the CARSFT spectrum calculator. Unfortunately, this CARSFT spectrum generator is computationally expensive and using it to recover multiple flow parameters can be prohibitively time-consuming, especially when experiments have hundreds or thousands of measurements distributed over time or space. To overcome these issues, several methods have been developed to approximate CARSFT using a library of pre-computed theoretical spectra. In this work we present a new approach that leverages ideas from the machine learning literature to build an adaptively smoothed kernel-based approximator. In application on a simulated dual-pump CARS experiment probing a $H_2/$air flame, we show that the approach can use a small number library spectra to quickly and accurately recover temperature and four gas species' mole fractions. The method's flexibility allows fine-tuned navigation of the trade-off between speed and accuracy, and makes the approach suitable for a wide range of problems and flow regimes.
△ Less
Submitted 26 October, 2021;
originally announced November 2021.
-
Enabling Blockchain Scalability and Interoperability with Mobile Computing through LayerOne.X
Authors:
Kevin Coutinho,
Ponnie Clark,
Ferdinand Azis,
Norman Lip,
Josh Hunt
Abstract:
Interoperability and scalability are currently the bottlenecks preventing mass adoption of blockchain technology. Development of an interoperable and scalable network that promotes a truly decentralised, permissionless and secure blockchain as well as one that enables micro validation is the main goal of this project. Layer-One.X, a truly decentralised ledger which utilises para-sharding, Directed…
▽ More
Interoperability and scalability are currently the bottlenecks preventing mass adoption of blockchain technology. Development of an interoperable and scalable network that promotes a truly decentralised, permissionless and secure blockchain as well as one that enables micro validation is the main goal of this project. Layer-One.X, a truly decentralised ledger which utilises para-sharding, Directed Acyclic Graphs, Proof of Participation consensus mechanism, mobile computing, flash contracts and nucleus scripting is introduced in this paper. The conceptual framework including tokenomics is also explained along with a number of use cases. The framework facilitates the growing need of transaction per second enabling micro based payments and value transfer through tokenisation.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
The 2021 RecSys Challenge Dataset: Fairness is not optional
Authors:
Luca Belli,
Alykhan Tejani,
Frank Portman,
Alexandre Lung-Yut-Fong,
Ben Chamberlain,
Yuanpu Xie,
Kristian Lum,
Jonathan Hunt,
Michael Bronstein,
Vito Walter Anelli,
Saikishore Kalloori,
Bruce Ferwerda,
Wenzhe Shi
Abstract:
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dat…
▽ More
After the success the RecSys 2020 Challenge, we are describing a novel and bigger dataset that was released in conjunction with the ACM RecSys Challenge 2021. This year's dataset is not only bigger (~ 1B data points, a 5 fold increase), but for the first time it take into consideration fairness aspects of the challenge. Unlike many static datsets, a lot of effort went into making sure that the dataset was synced with the Twitter platform: if a user deleted their content, the same content would be promptly removed from the dataset too. In this paper, we introduce the dataset and challenge, highlighting some of the issues that arise when creating recommender systems at Twitter scale.
△ Less
Submitted 21 September, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
An Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction
Authors:
Conor O'Brien,
Kin Sum Liu,
James Neufeld,
Rafael Barreto,
Jonathan J Hunt
Abstract:
Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised product. The conceptual similarity between these tasks has promoted the use of multi-task learning: a class of algorithms that aim to bring positive ind…
▽ More
Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised product. The conceptual similarity between these tasks has promoted the use of multi-task learning: a class of algorithms that aim to bring positive inductive transfer from related tasks. Here, we empirically evaluate multi-task learning approaches with neural networks for an online advertising task. Specifically, we consider approximating the probability of post-click conversion events (installs) (CVR) for mobile app advertising on a large-scale advertising platform, using the related click events (CTR) as an auxiliary task. We use an ablation approach to systematically study recent approaches that incorporate both multitask learning and "entire space modeling" which train the CVR on all logged examples rather than learning a conditional likelihood of conversion given clicked. Based on these results we show that several different approaches result in similar levels of positive transfer from the data-abundant CTR task to the CVR task and offer some insight into how the multi-task design choices address the two primary problems affecting the CVR task: data sparsity and data bias. Our findings add to the growing body of evidence suggesting that standard multi-task learning is a sensible approach to modelling related events in real-world large-scale applications and suggest the specific multitask approach can be guided by ease of implementation in an existing system.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
The Option Keyboard: Combining Skills in Reinforcement Learning
Authors:
André Barreto,
Diana Borsa,
Shaobo Hou,
Gheorghe Comanici,
Eser Aygün,
Philippe Hamel,
Daniel Toyama,
Jonathan Hunt,
Shibl Mourad,
David Silver,
Doina Precup
Abstract:
The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show…
▽ More
The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Achieving a quantum smart workforce
Authors:
Clarice D. Aiello,
D. D. Awschalom,
Hannes Bernien,
Tina Brower-Thomas,
Kenneth R. Brown,
Todd A. Brun,
Justin R. Caram,
Eric Chitambar,
Rosa Di Felice,
Michael F. J. Fox,
Stephan Haas,
Alexander W. Holleitner,
Eric R. Hudson,
Jeffrey H. Hunt,
Robert Joynt,
Scott Koziol,
H. J. Lewandowski,
Douglas T. McClure,
Jens Palsberg,
Gina Passante,
Kristen L. Pudenz,
Christopher J. K. Richardson,
Jessica L. Rosenberg,
R. S. Ross,
Mark Saffman
, et al. (7 additional authors not shown)
Abstract:
Interest in building dedicated Quantum Information Science and Engineering (QISE) education programs has greatly expanded in recent years. These programs are inherently convergent, complex, often resource intensive and likely require collaboration with a broad variety of stakeholders. In order to address this combination of challenges, we have captured ideas from many members in the community. Thi…
▽ More
Interest in building dedicated Quantum Information Science and Engineering (QISE) education programs has greatly expanded in recent years. These programs are inherently convergent, complex, often resource intensive and likely require collaboration with a broad variety of stakeholders. In order to address this combination of challenges, we have captured ideas from many members in the community. This manuscript not only addresses policy makers and funding agencies (both public and private and from the regional to the international level) but also contains needs identified by industry leaders and discusses the difficulties inherent in creating an inclusive QISE curriculum. We report on the status of eighteen post-secondary education programs in QISE and provide guidance for building new programs. Lastly, we encourage the development of a comprehensive strategic plan for quantum education and workforce development as a means to make the most of the ongoing substantial investments being made in QISE.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Physically Embedded Planning Problems: New Challenges for Reinforcement Learning
Authors:
Mehdi Mirza,
Andrew Jaegle,
Jonathan J. Hunt,
Arthur Guez,
Saran Tunyasuvunakool,
Alistair Muldal,
Théophane Weber,
Peter Karkus,
Sébastien Racanière,
Lars Buesing,
Timothy Lillicrap,
Nicolas Heess
Abstract:
Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They…
▽ More
Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They understand the state of the game by looking at the physical board in front of them and modify it by manipulating pieces using touch and fine-grained motor control. Mastering complicated physical systems with abstract goals is a central challenge for artificial intelligence, but it remains out of reach for existing RL algorithms. To encourage progress towards this goal we introduce a set of physically embedded planning problems and make them publicly available. We embed challenging symbolic tasks (Sokoban, tic-tac-toe, and Go) in a physics engine to produce a set of tasks that require perception, reasoning, and motor control over long time horizons. Although existing RL algorithms can tackle the symbolic versions of these tasks, we find that they struggle to master even the simplest of their physically embedded counterparts. As a first step towards characterizing the space of solution to these tasks, we introduce a strong baseline that uses a pre-trained expert game player to provide hints in the abstract space to an RL agent's policy while training it on the full sensorimotor control task. The resulting agent solves many of the tasks, underlining the need for methods that bridge the gap between abstract planning and embodied control. See illustrating video at https://youtu.be/RwHiHlym_1k.
△ Less
Submitted 29 October, 2020; v1 submitted 11 September, 2020;
originally announced September 2020.
-
A Modified Epidemiological Model to Understand the Uneven Impact of COVID-19 on Vulnerable Individuals and the Approaches Required to Help them Emerge from Lockdown
Authors:
Dario Ortega Anderez,
Eiman Kanjo,
Ganna Pogrebna,
Shane Johnson,
John Alan Hunt
Abstract:
COVID-19 has shown a relatively low mortality rate in young healthy individuals, with the majority of this group being asymptomatic or having mild symptoms, while the severity of the disease among individuals with underlying health conditions has caused signiffcant mortality rates worldwide. Understanding these differences in mortality amongst different sectors of society and modelling this will e…
▽ More
COVID-19 has shown a relatively low mortality rate in young healthy individuals, with the majority of this group being asymptomatic or having mild symptoms, while the severity of the disease among individuals with underlying health conditions has caused signiffcant mortality rates worldwide. Understanding these differences in mortality amongst different sectors of society and modelling this will enable the different levels of risk and vulnerabilities to be determined to enable strategies exit the lockdown. However, epidemiological models do not account for the variability encountered in the severity of the SARS-CoV-2 disease across different population groups. To overcome this limitation, it is proposed that a modiffed SEIR model, namely SEIR-v, through which the population is separated into two groups regarding their vulnerability to SARS-CoV-2 is applied. This enables the analysis of the spread of the epidemic when different contention measures are applied to different groups in society regarding their vulnerability to the disease. A Monte Carlo simulation indicates a large number of deaths could be avoided by slightly decreasing the exposure of vulnerable groups to the disease. From this modelling a number of mechanisms can be proposed to limit the exposure of vulnerable individuals to the disease in order to reduce the mortality rate among this group. One option could be the provision of a wristband to vulnerable people and those without a contact-tracing app. By combining very dense contact tracing data from smartphone apps and wristband signals with information about infection status and symptoms, vulnerable people can be protected and kept safer. Widespread utilisation would extend the protection further beyond these high risk groups.
△ Less
Submitted 19 June, 2020; v1 submitted 25 May, 2020;
originally announced June 2020.
-
3D Augmented Reality-Assisted CT-Guided Interventions: System Design and Preclinical Trial on an Abdominal Phantom using HoloLens 2
Authors:
Brian J. Park,
Stephen J. Hunt,
Gregory J. Nadolski,
Terence P. Gade
Abstract:
Background: Out-of-plane lesions pose challenges for CT-guided interventions. Augmented reality (AR) headset devices have evolved and are readily capable to provide virtual 3D guidance to improve CT-guided targeting.
Purpose: To describe the design of a three-dimensional (3D) AR-assisted navigation system using HoloLens 2 and evaluate its performance through CT-guided simulations.
Materials an…
▽ More
Background: Out-of-plane lesions pose challenges for CT-guided interventions. Augmented reality (AR) headset devices have evolved and are readily capable to provide virtual 3D guidance to improve CT-guided targeting.
Purpose: To describe the design of a three-dimensional (3D) AR-assisted navigation system using HoloLens 2 and evaluate its performance through CT-guided simulations.
Materials and Methods: A prospective trial was performed assessing CT-guided needle targeting on an abdominal phantom with and without AR guidance. A total of 8 operators with varying clinical experience were enrolled and performed a total of 86 needle passes. Procedure efficiency, radiation dose, and complication rates were compared with and without AR guidance. Vector analysis of the first needle pass was also performed.
Results: Average total number of needle passes to reach the target reduced from 7.4 passes without AR to 3.4 passes with AR (54.2% decrease, p=0.011). Average dose-length product (DLP) decreased from 538 mGy-cm without AR to 318 mGy-cm with AR (41.0% decrease, p=0.009). Complication rate of hitting a non-targeted lesion decreased from 11.9% without AR (7/59 needle passes) to 0% with AR (0/27 needle passes). First needle passes were more nearly aligned with the ideal target trajectory with AR versus without AR (4.6° vs 8.0° offset, respectively, p=0.018). Medical students, residents, and attendings all performed at the same level with AR guidance.
Conclusions: 3D AR guidance can provide significant improvements in procedural efficiency and radiation dose savings for targeting challenging, out-of-plane lesions. AR guidance elevated the performance of all operators to the same level irrespective of prior clinical experience.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Active Learning for Gaussian Process Considering Uncertainties with Application to Shape Control of Composite Fuselage
Authors:
Xiaowei Yue,
Yuchen Wen,
Jeffrey H. Hunt,
Jianjun Shi
Abstract:
In the machine learning domain, active learning is an iterative data selection algorithm for maximizing information acquisition and improving model performance with limited training samples. It is very useful, especially for the industrial applications where training samples are expensive, time-consuming, or difficult to obtain. Existing methods mainly focus on active learning for classification,…
▽ More
In the machine learning domain, active learning is an iterative data selection algorithm for maximizing information acquisition and improving model performance with limited training samples. It is very useful, especially for the industrial applications where training samples are expensive, time-consuming, or difficult to obtain. Existing methods mainly focus on active learning for classification, and a few methods are designed for regression such as linear regression or Gaussian process. Uncertainties from measurement errors and intrinsic input noise inevitably exist in the experimental data, which further affects the modeling performance. The existing active learning methods do not incorporate these uncertainties for Gaussian process. In this paper, we propose two new active learning algorithms for the Gaussian process with uncertainties, which are variance-based weighted active learning algorithm and D-optimal weighted active learning algorithm. Through numerical study, we show that the proposed approach can incorporate the impact from uncertainties, and realize better prediction performance. This approach has been applied to improving the predictive modeling for automatic shape control of composite fuselage.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
An AI-Augmented Lesion Detection Framework For Liver Metastases With Model Interpretability
Authors:
Xin J. Hunt,
Ralph Abbey,
Ricky Tharrington,
Joost Huiskens,
Nina Wesdorp
Abstract:
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide. Most CRC deaths are the result of progression of metastases. The assessment of metastases is done using the RECIST criterion, which is time consuming and subjective, as clinicians need to manually measure anatomical tumor sizes. AI has many successes in image object detection, b…
▽ More
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide. Most CRC deaths are the result of progression of metastases. The assessment of metastases is done using the RECIST criterion, which is time consuming and subjective, as clinicians need to manually measure anatomical tumor sizes. AI has many successes in image object detection, but often suffers because the models used are not interpretable, leading to issues in trust and implementation in the clinical setting. We propose a framework for an AI-augmented system in which an interactive AI system assists clinicians in the metastasis assessment. We include model interpretability to give explanations of the reasoning of the underlying models.
△ Less
Submitted 17 July, 2019;
originally announced July 2019.
-
Composing Entropic Policies using Divergence Correction
Authors:
Jonathan J Hunt,
Andre Barreto,
Timothy P Lillicrap,
Nicolas Heess
Abstract:
Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum en…
▽ More
Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum entropy framework and introduce an algorithm for the practical implementation of successor features in continuous action spaces. Then we propose a novel approach which addresses the failure cases of prior work and, in principle, recovers the optimal policy during transfer. This method works by explicitly learning the (discounted, future) divergence between base policies. We study this approach in the tabular case and on non-trivial continuous control problems with compositional structure and show that it outperforms or matches existing methods across all tasks considered.
△ Less
Submitted 5 July, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Multi-Task Learning with Incomplete Data for Healthcare
Authors:
Xin J. Hunt,
Saba Emrani,
Ilknur Kaynar Kabul,
Jorge Silva
Abstract:
Multi-task learning is a type of transfer learning that trains multiple tasks simultaneously and leverages the shared information between related tasks to improve the generalization performance. However, missing features in the input matrix is a much more difficult problem which needs to be carefully addressed. Removing records with missing values can significantly reduce the sample size, which is…
▽ More
Multi-task learning is a type of transfer learning that trains multiple tasks simultaneously and leverages the shared information between related tasks to improve the generalization performance. However, missing features in the input matrix is a much more difficult problem which needs to be carefully addressed. Removing records with missing values can significantly reduce the sample size, which is impractical for datasets with large percentage of missing values. Popular imputation methods often distort the covariance structure of the data, which causes inaccurate inference. In this paper we propose using plug-in covariance matrix estimators to tackle the challenge of missing features. Specifically, we analyze the plug-in estimators under the framework of robust multi-task learning with LASSO and graph regularization, which captures the relatedness between tasks via graph regularization. We use the Alzheimer's disease progression dataset as an example to show how the proposed framework is effective for prediction and model estimation when missing data is present.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes
Authors:
Jack W Rae,
Jonathan J Hunt,
Tim Harley,
Ivo Danihelka,
Andrew Senior,
Greg Wayne,
Alex Graves,
Timothy P Lillicrap
Abstract:
Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows --- limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory…
▽ More
Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows --- limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory access scheme, which we call Sparse Access Memory (SAM), that retains the representational power of the original approaches whilst training efficiently with very large memories. We show that SAM achieves asymptotic lower bounds in space and time complexity, and find that an implementation runs $1,\!000\times$ faster and with $3,\!000\times$ less physical memory than non-sparse models. SAM learns with comparable data efficiency to existing models on a range of synthetic tasks and one-shot Omniglot character recognition, and can scale to tasks requiring $100,\!000$s of time steps and memories. As well, we show how our approach can be adapted for models that maintain temporal associations between memories, as with the recently introduced Differentiable Neural Computer.
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Successor Features for Transfer in Reinforcement Learning
Authors:
André Barreto,
Will Dabney,
Rémi Munos,
Jonathan J. Hunt,
Tom Schaul,
Hado van Hasselt,
David Silver
Abstract:
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics o…
▽ More
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement", a generalization of dynamic programming's policy improvement operation that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows the free exchange of information across tasks. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice, significantly outperforming alternative methods in a sequence of navigation tasks and in the control of a simulated robotic arm.
△ Less
Submitted 12 April, 2018; v1 submitted 16 June, 2016;
originally announced June 2016.
-
Deep Reinforcement Learning in Large Discrete Action Spaces
Authors:
Gabriel Dulac-Arnold,
Richard Evans,
Hado van Hasselt,
Peter Sunehag,
Timothy Lillicrap,
Jonathan Hunt,
Timothy Mann,
Theophane Weber,
Thomas Degris,
Ben Coppin
Abstract:
Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems. Recommender systems, industrial plants and language models are only some of the many real-world tasks involving large numbers of discrete actions for which current methods are difficult or even often impossible to apply. An ability to general…
▽ More
Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems. Recommender systems, industrial plants and language models are only some of the many real-world tasks involving large numbers of discrete actions for which current methods are difficult or even often impossible to apply. An ability to generalize over the set of actions as well as sub-linear complexity relative to the size of the set are both necessary to handle such tasks. Current approaches are not able to provide both of these, which motivates the work in this paper. Our proposed approach leverages prior information about the actions to embed them in a continuous space upon which it can generalize. Additionally, approximate nearest-neighbor methods allow for logarithmic-time lookup complexity relative to the number of actions, which is necessary for time-wise tractable training. This combined approach allows reinforcement learning methods to be applied to large-scale learning problems previously intractable with current methods. We demonstrate our algorithm's abilities on a series of tasks having up to one million actions.
△ Less
Submitted 4 April, 2016; v1 submitted 23 December, 2015;
originally announced December 2015.
-
Memory-based control with recurrent neural networks
Authors:
Nicolas Heess,
Jonathan J Hunt,
Timothy P Lillicrap,
David Silver
Abstract:
Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time.
We demonstrate that this approach, coupled with long-short term m…
▽ More
Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time.
We demonstrate that this approach, coupled with long-short term memory is able to solve a variety of physical control problems exhibiting an assortment of memory requirements. These include the short-term integration of information from noisy sensors and the identification of system parameters, as well as long-term memory problems that require preserving information over many time steps. We also demonstrate success on a combined exploration and memory problem in the form of a simplified version of the well-known Morris water maze task. Finally, we show that our approach can deal with high-dimensional observations by learning directly from pixels.
We find that recurrent deterministic and stochastic policies are able to learn similarly good solutions to these tasks, including the water maze where the agent must learn effective search strategies.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Continuous control with deep reinforcement learning
Authors:
Timothy P. Lillicrap,
Jonathan J. Hunt,
Alexander Pritzel,
Nicolas Heess,
Tom Erez,
Yuval Tassa,
David Silver,
Daan Wierstra
Abstract:
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic pr…
▽ More
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
△ Less
Submitted 5 July, 2019; v1 submitted 9 September, 2015;
originally announced September 2015.
-
An approach for the automated risk assessment of structural differences between spreadsheets (DiffXL)
Authors:
John Hunt
Abstract:
This paper outlines an approach to manage and quantify the risks associated with changes made to spreadsheets. The methodology focuses on structural differences between spreadsheets and suggests a technique by which a risk analysis can be achieved in an automated environment. The paper offers an example that demonstrates how contiguous ranges of data can be mapped into a generic list of formulae…
▽ More
This paper outlines an approach to manage and quantify the risks associated with changes made to spreadsheets. The methodology focuses on structural differences between spreadsheets and suggests a technique by which a risk analysis can be achieved in an automated environment. The paper offers an example that demonstrates how contiguous ranges of data can be mapped into a generic list of formulae, data and metadata. The example then shows that comparison of these generic lists can establish the structural differences between spreadsheets and quantify the level of risk that each change has introduced. Lastly the benefits, drawbacks and limitations of the technique are discussed in a commercial context.
△ Less
Submitted 20 August, 2009;
originally announced August 2009.