-
Statistical Context Detection for Deep Lifelong Reinforcement Learning
Authors:
Jeffery Dick,
Saptarshi Nath,
Christos Peridis,
Eseoghene Benjamin,
Soheil Kolouri,
Andrea Soltoggio
Abstract:
Context detection involves labeling segments of an online stream of data as belonging to different tasks. Task labels are used in lifelong learning algorithms to perform consolidation or other procedures that prevent catastrophic forgetting. Inferring task labels from online experiences remains a challenging problem. Most approaches assume finite and low-dimension observation spaces or a prelimina…
▽ More
Context detection involves labeling segments of an online stream of data as belonging to different tasks. Task labels are used in lifelong learning algorithms to perform consolidation or other procedures that prevent catastrophic forgetting. Inferring task labels from online experiences remains a challenging problem. Most approaches assume finite and low-dimension observation spaces or a preliminary training phase during which task labels are learned. Moreover, changes in the transition or reward functions can be detected only in combination with a policy, and therefore are more difficult to detect than changes in the input distribution. This paper presents an approach to learning both policies and labels in an online deep reinforcement learning setting. The key idea is to use distance metrics, obtained via optimal transport methods, i.e., Wasserstein distance, on suitable latent action-reward spaces to measure distances between sets of data points from past and current streams. Such distances can then be used for statistical tests based on an adapted Kolmogorov-Smirnov calculation to assign labels to sequences of experiences. A rollback procedure is introduced to learn multiple policies by ensuring that only the appropriate data is used to train the corresponding policy. The combination of task detection and policy deployment allows for the optimization of lifelong reinforcement learning agents without an oracle that provides task labels. The approach is tested using two benchmarks and the results show promising performance when compared with related context detection algorithms. The results suggest that optimal transport statistical methods provide an explainable and justifiable procedure for online context detection and reward optimization in lifelong reinforcement learning.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Integrate Any Omics: Towards genome-wide data integration for patient stratification
Authors:
Shihao Ma,
Andy G. X. Zeng,
Benjamin Haibe-Kains,
Anna Goldenberg,
John E Dick,
Bo Wang
Abstract:
High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into ex…
▽ More
High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce IntegrAO (Integrate Any Omics), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO's robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukemia case study further validates its capability to uncover biological and clinical heterogeneity in incomplete datasets. IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning
Authors:
Andrea Soltoggio,
Eseoghene Ben-Iwhiwhu,
Christos Peridis,
Pawel Ladosz,
Jeffery Dick,
Praveen K. Pilly,
Soheil Kolouri
Abstract:
This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical…
▽ More
This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.
△ Less
Submitted 21 January, 2023;
originally announced February 2023.
-
Context Meta-Reinforcement Learning via Neuromodulation
Authors:
Eseoghene Ben-Iwhiwhu,
Jeffery Dick,
Nicholas A. Ketz,
Praveen K. Pilly,
Andrea Soltoggio
Abstract:
Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is c…
▽ More
Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines.
△ Less
Submitted 26 April, 2022; v1 submitted 29 October, 2021;
originally announced November 2021.
-
Towards a theory of out-of-distribution learning
Authors:
Jayanta Dey,
Ali Geisa,
Ronak Mehta,
Tyler M. Tomita,
Hayden S. Helm,
Haoyin Xu,
Eric Eaton,
Jeffery Dick,
Carey E. Priebe,
Joshua T. Vogelstein
Abstract:
Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid…
▽ More
Learning is a process wherein a learning agent enhances its performance through exposure of experience or data. Throughout this journey, the agent may encounter diverse learning environments. For example, data may be presented to the leaner all at once, in multiple batches, or sequentially. Furthermore, the distribution of each data sample could be either identical and independent (iid) or non-iid. Additionally, there may exist computational and space constraints for the deployment of the learning algorithms. The complexity of a learning task can vary significantly, depending on the learning setup and the constraints imposed upon it. However, it is worth noting that the current literature lacks formal definitions for many of the in-distribution and out-of-distribution learning paradigms. Establishing proper and universally agreed-upon definitions for these learning setups is essential for thoroughly exploring the evolution of ideas across different learning scenarios and deriving generalized mathematical bounds for these learners. In this paper, we aim to address this issue by proposing a chronological approach to defining different learning tasks using the provably approximately correct (PAC) learning framework. We will start with in-distribution learning and progress to recently proposed lifelong or continual learning. We employ consistent terminology and notation to demonstrate how each of these learning frameworks represents a specific instance of a broader, more generalized concept of learnability. Our hope is that this work will inspire a universally agreed-upon approach to quantifying different types of learning, fostering greater understanding and progress in the field.
△ Less
Submitted 7 June, 2024; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Quality and Complexity Assessment of Learning-Based Image Compression Solutions
Authors:
João Dick,
Brunno Abreu,
Mateus Grellert,
Sergio Bampi
Abstract:
This work presents an analysis of state-of-the-art learning-based image compression techniques. We compare 8 models available in the Tensorflow Compression package in terms of visual quality metrics and processing time, using the KODAK data set. The results are compared with the Better Portable Graphics (BPG) and the JPEG2000 codecs. Results show that JPEG2000 has the lowest execution times compar…
▽ More
This work presents an analysis of state-of-the-art learning-based image compression techniques. We compare 8 models available in the Tensorflow Compression package in terms of visual quality metrics and processing time, using the KODAK data set. The results are compared with the Better Portable Graphics (BPG) and the JPEG2000 codecs. Results show that JPEG2000 has the lowest execution times compared with the fastest learning-based model, with a speedup of 1.46x in compression and 30x in decompression. However, the learning-based models achieved improvements over JPEG2000 in terms of quality, specially for lower bitrates. Our findings also show that BPG is more efficient in terms of PSNR, but the learning models are better for other quality metrics, and sometimes even faster. The results indicate that learning-based techniques are promising solutions towards a future mainstream compression method.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems
Authors:
Eseoghene Ben-Iwhiwhu,
Pawel Ladosz,
Jeffery Dick,
Wen-Hua Chen,
Praveen Pilly,
Andrea Soltoggio
Abstract:
Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational struct…
▽ More
Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.
△ Less
Submitted 28 April, 2020; v1 submitted 27 April, 2020;
originally announced April 2020.
-
Deep Learning Based Unsupervised and Semi-supervised Classification for Keratoconus
Authors:
Nicole Hallett,
Kai Yi,
Josef Dick,
Christopher Hodge,
Gerard Sutton,
Yu Guang Wang,
Jingjing You
Abstract:
The transparent cornea is the window of the eye, facilitating the entry of light rays and controlling focusing the movement of the light within the eye. The cornea is critical, contributing to 75% of the refractive power of the eye. Keratoconus is a progressive and multifactorial corneal degenerative disease affecting 1 in 2000 individuals worldwide. Currently, there is no cure for keratoconus oth…
▽ More
The transparent cornea is the window of the eye, facilitating the entry of light rays and controlling focusing the movement of the light within the eye. The cornea is critical, contributing to 75% of the refractive power of the eye. Keratoconus is a progressive and multifactorial corneal degenerative disease affecting 1 in 2000 individuals worldwide. Currently, there is no cure for keratoconus other than corneal transplantation for advanced stage keratoconus or corneal cross-linking, which can only halt KC progression. The ability to accurately identify subtle KC or KC progression is of vital clinical significance. To date, there has been little consensus on a useful model to classify KC patients, which therefore inhibits the ability to predict disease progression accurately.
In this paper, we utilised machine learning to analyse data from 124 KC patients, including topographical and clinical variables. Both supervised multilayer perceptron and unsupervised variational autoencoder models were used to classify KC patients with reference to the existing Amsler-Krumeich (A-K) classification system. Both methods result in high accuracy, with the unsupervised method showing better performance. The result showed that the unsupervised method with a selection of 29 variables could be a powerful tool to provide an automatic classification tool for clinicians. These outcomes provide a platform for additional analysis for the progression and treatment of keratoconus.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
Authors:
Pawel Ladosz,
Eseoghene Ben-Iwhiwhu,
Jeffery Dick,
Yang Hu,
Nicholas Ketz,
Soheil Kolouri,
Jeffrey L. Krichmar,
Praveen Pilly,
Andrea Soltoggio
Abstract:
This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error can…
▽ More
This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN's results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards.
△ Less
Submitted 14 October, 2021; v1 submitted 21 September, 2019;
originally announced September 2019.
-
Multiscale socio-ecological networks in the age of information
Authors:
Maxime Lenormand,
Sandra Luque,
Johannes Langemeyer,
Patrizia Tenerelli,
Grazia Zulian,
Inge Aalders,
Serban Chivulescu,
Pedro Clemente,
Jan Dick,
Jiska van Dijk,
Michiel van Eupen,
Relu C. Giuca,
Leena Kopperoinen,
Eszter Lellei-Kovács,
Michael Leone,
Juraj Lieskovský,
Uta Schirpke,
Alison C. Smith,
Ulrike Tappeiner,
Helen Woods
Abstract:
Interactions between people and ecological systems, through leisure or tourism activities, form a complex socio-ecological spatial network. The analysis of the benefits people derive from their interactions with nature -- also referred to as cultural ecosystem services (CES) -- enables a better understanding of these socio-ecological systems. In the age of information, the increasing availability…
▽ More
Interactions between people and ecological systems, through leisure or tourism activities, form a complex socio-ecological spatial network. The analysis of the benefits people derive from their interactions with nature -- also referred to as cultural ecosystem services (CES) -- enables a better understanding of these socio-ecological systems. In the age of information, the increasing availability of large social media databases enables a better understanding of complex socio-ecological interactions at an unprecedented spatio-temporal resolution. Within this context, we model and analyze these interactions based on information extracted from geotagged photographs embedded into a multiscale socio-ecological network. We apply this approach to 16 case study sites in Europe using a social media database (Flickr) containing more than 150,000 validated and classified photographs. After evaluating the representativeness of the network, we investigate the impact of visitors' origin on the distribution of socio-ecological interactions at different scales. First at a global scale, we develop a spatial measure of attractiveness and use this to identify four groups of sites. Then, at a local scale, we explore how the distance traveled by the users to reach a site affects the way they interact with this site in space and time. The approach developed here, integrating social media data into a network-based framework, offers a new way of visualizing and modeling interactions between humans and landscapes. Results provide valuable insights for understanding relationships between social demands for CES and the places of their realization, thus allowing for the development of more efficient conservation and planning strategies.
△ Less
Submitted 2 November, 2018; v1 submitted 2 May, 2018;
originally announced May 2018.
-
Lattice based integration algorithms: Kronecker sequences and rank-1 lattices
Authors:
Josef Dick,
Friedrich Pillichshammer,
Kosuke Suzuki,
Mario Ullrich,
Takehito Yoshiki
Abstract:
We prove upper bounds on the order of convergence of lattice based algorithms for numerical integration in function spaces of dominating mixed smoothness on the unit cube with homogeneous boundary condition. More precisely, we study worst-case integration errors for Besov spaces of dominating mixed smoothness $\mathring{\mathbf{B}}^s_{p,θ}$, which also comprise the concept of Sobolev spaces of dom…
▽ More
We prove upper bounds on the order of convergence of lattice based algorithms for numerical integration in function spaces of dominating mixed smoothness on the unit cube with homogeneous boundary condition. More precisely, we study worst-case integration errors for Besov spaces of dominating mixed smoothness $\mathring{\mathbf{B}}^s_{p,θ}$, which also comprise the concept of Sobolev spaces of dominating mixed smoothness $\mathring{\mathbf{H}}^s_{p}$ as special cases. The considered algorithms are quasi-Monte Carlo rules with underlying nodes from $T_N(\mathbb{Z}^d) \cap [0,1)^d$, where $T_N$ is a real invertible generator matrix of size $d$. For such rules the worst-case error can be bounded in terms of the Zaremba index of the lattice $\mathbb{X}_N=T_N(\mathbb{Z}^d)$. We apply this result to Kronecker lattices and to rank-1 lattice point sets, which both lead to optimal error bounds up to $\log N$-factors for arbitrary smoothness $s$. The advantage of Kronecker lattices and classical lattice point sets is that the run-time of algorithms generating these point sets is very short.
△ Less
Submitted 30 August, 2016;
originally announced August 2016.