Search | arXiv e-print repository

Pearl: A Production-ready Reinforcement Learning Agent

Authors: Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Liyuan Wang, Ruiyang Xu, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu

Abstract: Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety const… ▽ More Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2306.00867 [pdf, other]

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Authors: Rohan Chitnis, Yingchen Xu, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau

Abstract: Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model… ▽ More Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model of the environment can alleviate this issue. In this paper, we make two key contributions: 1) we introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL); 2) we propose to use IQL-TD-MPC as a Manager in a hierarchical setting with any off-the-shelf offline RL algorithm as a Worker. More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to predict "intent embeddings", which roughly correspond to subgoals, via planning. We empirically show that augmenting state representations with intent embeddings generated by an IQL-TD-MPC manager significantly improves off-the-shelf offline RL agents' performance on some of the most challenging D4RL benchmark tasks. For instance, the offline RL algorithms AWAC, TD3-BC, DT, and CQL all get zero or near-zero normalized evaluation scores on the medium and large antmaze tasks, while our modification gives an average score over 40. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Journal ref: Short version published at ICRA 2024 (https://tinyurl.com/icra24-iqltdmpc)

arXiv:2110.03165 [pdf, other]

Offline RL With Resource Constrained Online Deployment

Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun Jung, Abhishek Gupta, Urun Dogan

Abstract: Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (ava… ▽ More Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL. △ Less

Submitted 7 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Added experiments on discrete control and real world datasets along with more analyses on continuous control tasks

arXiv:2105.01289 [pdf, other]

Representation Learning for Clustering via Building Consensus

Authors: Aniket Anand Deshmukh, Jayanth Reddy Regatti, Eren Manavoglu, Urun Dogan

Abstract: In this paper, we focus on unsupervised representation learning for clustering of images. Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must be close in the representation space (exemplar consistency), and/or similar images must have similar cluster assignment… ▽ More In this paper, we focus on unsupervised representation learning for clustering of images. Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must be close in the representation space (exemplar consistency), and/or similar images must have similar cluster assignments (population consistency). We define an additional notion of consistency, consensus consistency, which ensures that representations are learned to induce similar partitions for variations in the representation space, different clustering algorithms or different initializations of a single clustering algorithm. We define a clustering loss by executing variations in the representation space and seamlessly integrate all three consistencies (consensus, exemplar and population) into an end-to-end learning framework. The proposed algorithm, consensus clustering using unsupervised representation learning (ConCURL), improves upon the clustering performance of state-of-the-art methods on four out of five image datasets. Furthermore, we extend the evaluation procedure for clustering to reflect the challenges encountered in real-world clustering tasks, such as maintaining clustering performance in cases with distribution shifts. We also perform a detailed ablation study for a deeper understanding of the proposed algorithm. The code and the trained models are available at https://github.com/JayanthRR/ConCURL_NCE. △ Less

Submitted 25 April, 2022; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: Paper is accepted at Springer Machine Learning Journal 2022. The code and the trained models are available at https://github.com/JayanthRR/ConCURL_NCE

arXiv:2010.01245 [pdf, other]

Consensus Clustering With Unsupervised Representation Learning

Authors: Jayanth Reddy Regatti, Aniket Anand Deshmukh, Eren Manavoglu, Urun Dogan

Abstract: Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must either be closer in the representation space, or have a similar cluster assignment. Bootstrap Your Own Latent (BYOL) is one such representation learning algorithm that has achieved state-of-the-art results in… ▽ More Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must either be closer in the representation space, or have a similar cluster assignment. Bootstrap Your Own Latent (BYOL) is one such representation learning algorithm that has achieved state-of-the-art results in self-supervised image classification on ImageNet under the linear evaluation protocol. However, the utility of the learnt features of BYOL to perform clustering is not explored. In this work, we study the clustering ability of BYOL and observe that features learnt using BYOL may not be optimal for clustering. We propose a novel consensus clustering based loss function, and train BYOL with the proposed loss in an end-to-end way that improves the clustering ability and outperforms similar clustering based methods on some popular computer vision datasets. △ Less

Submitted 8 July, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)

arXiv:2008.07443 [pdf, other]

Zero Shot Domain Generalization

Authors: Udit Maniyar, Joseph K J, Aniket Anand Deshmukh, Urun Dogan, Vineeth N Balasubramanian

Abstract: Standard supervised learning setting assumes that training data and test data come from the same distribution (domain). Domain generalization (DG) methods try to learn a model that when trained on data from multiple domains, would generalize to a new unseen domain. We extend DG to an even more challenging setting, where the label space of the unseen domain could also change. We introduce this prob… ▽ More Standard supervised learning setting assumes that training data and test data come from the same distribution (domain). Domain generalization (DG) methods try to learn a model that when trained on data from multiple domains, would generalize to a new unseen domain. We extend DG to an even more challenging setting, where the label space of the unseen domain could also change. We introduce this problem as Zero-Shot Domain Generalization (to the best of our knowledge, the first such effort), where the model generalizes across new domains and also across new classes in those domains. We propose a simple strategy which effectively exploits semantic information of classes, to adapt existing DG methods to meet the demands of Zero-Shot Domain Generalization. We evaluate the proposed methods on CIFAR-10, CIFAR-100, F-MNIST and PACS datasets, establishing a strong baseline to foster interest in this new research direction. △ Less

Submitted 17 August, 2020; originally announced August 2020.

Comments: Accepted to BMVC 2020

arXiv:2003.08485 [pdf, other]

Self-Supervised Contextual Bandits in Computer Vision

Authors: Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles, Denis Charles, Eren Manavoglu, Urun Dogan

Abstract: Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. I… ▽ More Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases. △ Less

Submitted 18 March, 2020; originally announced March 2020.

arXiv:2002.07384 [pdf, other]

Data Transformation Insights in Self-supervision with Clustering Tasks

Authors: Abhimanu Kumar, Aniket Anand Deshmukh, Urun Dogan, Denis Charles, Eren Manavoglu

Abstract: Self-supervision is key to extending use of deep learning for label scarce domains. For most of self-supervised approaches data transformations play an important role. However, up until now the impact of transformations have not been studied. Furthermore, different transformations may have different impact on the system. We provide novel insights into the use of data transformation in self-supervi… ▽ More Self-supervision is key to extending use of deep learning for label scarce domains. For most of self-supervised approaches data transformations play an important role. However, up until now the impact of transformations have not been studied. Furthermore, different transformations may have different impact on the system. We provide novel insights into the use of data transformation in self-supervised tasks, specially pertaining to clustering. We show theoretically and empirically that certain set of transformations are helpful in convergence of self-supervised clustering. We also show the cases when the transformations are not helpful or in some cases even harmful. We show faster convergence rate with valid transformations for convex as well as certain family of non-convex objectives along with the proof of convergence to the original set of optima. We have synthetic as well as real world data experiments. Empirically our results conform with the theoretical insights provided. △ Less

Submitted 18 February, 2020; originally announced February 2020.

arXiv:1911.06902 [pdf, other]

Label-similarity Curriculum Learning

Authors: Urun Dogan, Aniket Anand Deshmukh, Marcin Machura, Christian Igel

Abstract: Curriculum learning can improve neural network training by guiding the optimization to desirable optima. We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation. The idea is to use a probability distribution over classes as target label, where the class probabilities reflect the similarity to the true class. Gradua… ▽ More Curriculum learning can improve neural network training by guiding the optimization to desirable optima. We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation. The idea is to use a probability distribution over classes as target label, where the class probabilities reflect the similarity to the true class. Gradually, this label representation is shifted towards the standard one-hot-encoding. That is, in the beginning minor mistakes are corrected less than large mistakes, resembling a teaching process in which broad concepts are explained first before subtle differences are taught. The class similarity can be based on prior knowledge. For the special case of the labels being natural words, we propose a generic way to automatically compute the similarities. The natural words are embedded into Euclidean space using a standard word embedding. The probability of each class is then a function of the cosine similarity between the vector representations of the class and the true label. The proposed label-similarity curriculum learning (LCL) approach was empirically evaluated using several popular deep learning architectures for image classification tasks applied to five datasets including ImageNet, CIFAR100, and AWA2. In all scenarios, LCL was able to improve the classification accuracy on the test data compared to standard training. △ Less

Submitted 22 July, 2020; v1 submitted 15 November, 2019; originally announced November 2019.

Comments: Accepted as a conference paper at ECCV 2020

arXiv:1905.10392 [pdf, other]

A Generalization Error Bound for Multi-class Domain Generalization

Authors: Aniket Anand Deshmukh, Yunwen Lei, Srinagesh Sharma, Urun Dogan, James W. Cutler, Clayton Scott

Abstract: Domain generalization is the problem of assigning labels to an unlabeled data set, given several similar data sets for which labels have been provided. Despite considerable interest in this problem over the last decade, there has been no theoretical analysis in the setting of multi-class classification. In this work, we study a kernel-based learning algorithm and establish a generalization error b… ▽ More Domain generalization is the problem of assigning labels to an unlabeled data set, given several similar data sets for which labels have been provided. Despite considerable interest in this problem over the last decade, there has been no theoretical analysis in the setting of multi-class classification. In this work, we study a kernel-based learning algorithm and establish a generalization error bound that scales logarithmically in the number of classes, matching state-of-the-art bounds for multi-class classification in the conventional learning setting. We also demonstrate empirically that the proposed algorithm achieves significant performance gains compared to a pooling strategy. △ Less

Submitted 24 May, 2019; originally announced May 2019.

arXiv:1706.09814 [pdf, other]

Data-dependent Generalization Bounds for Multi-class Classification

Authors: Yunwen Lei, Urun Dogan, Ding-Xuan Zhou, Marius Kloft

Abstract: In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian c… ▽ More In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian complexities and empirical $\ell_\infty$-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the $\ell_2$- and $\ell_\infty$-norm, respectively. We establish data-dependent error bounds in terms of complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines, exhibiting a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class SVM by Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency which significantly improves the previous square-root dependency. Experimental results are reported to verify the effectiveness of our theoretical findings. △ Less

Submitted 29 December, 2017; v1 submitted 29 June, 2017; originally announced June 2017.

arXiv:1705.08618 [pdf, other]

Multi-Task Learning for Contextual Bandits

Authors: Aniket Anand Deshmukh, Urun Dogan, Clayton Scott

Abstract: Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setti… ▽ More Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets. △ Less

Submitted 24 May, 2017; originally announced May 2017.

arXiv:1611.08480 [pdf, ps, other]

doi 10.1371/journal.pone.0178161

Distributed Optimization of Multi-Class SVMs

Authors: Maximilian Alber, Julian Zimmert, Urun Dogan, Marius Kloft

Abstract: Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop dis… ▽ More Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that parallelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data. △ Less

Submitted 8 December, 2016; v1 submitted 25 November, 2016; originally announced November 2016.

arXiv:1506.04364 [pdf, other]

Localized Multiple Kernel Learning---A Convex Approach

Authors: Yunwen Lei, Alexander Binder, Ürün Dogan, Marius Kloft

Abstract: We propose a localized approach to multiple kernel learning that can be formulated as a convex optimization problem over a given cluster structure. For which we obtain generalization error guarantees and derive an optimization algorithm based on the Fenchel dual representation. Experiments on real-world datasets from the application domains of computational biology and computer vision show that co… ▽ More We propose a localized approach to multiple kernel learning that can be formulated as a convex optimization problem over a given cluster structure. For which we obtain generalization error guarantees and derive an optimization algorithm based on the Fenchel dual representation. Experiments on real-world datasets from the application domains of computational biology and computer vision show that convex localized multiple kernel learning can achieve higher prediction accuracies than its global and non-convex local counterparts. △ Less

Submitted 12 October, 2016; v1 submitted 14 June, 2015; originally announced June 2015.

Comments: to appear in ACML 2016

arXiv:1506.04359 [pdf, ps, other]

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms

Authors: Yunwen Lei, Ürün Dogan, Alexander Binder, Marius Kloft

Abstract: This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis. The theoretical analysis motivates us to introduce a ne… ▽ More This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis. The theoretical analysis motivates us to introduce a new multi-class classification machine based on $\ell_p$-norm regularization, where the parameter $p$ controls the complexity of the corresponding bounds. We derive an efficient optimization algorithm based on Fenchel duality theory. Benchmarks on several real-world datasets show that the proposed algorithm can achieve significant accuracy gains over the state of the art. △ Less

Submitted 14 June, 2015; originally announced June 2015.

arXiv:1401.3737 [pdf, ps, other]

Coordinate Descent with Online Adaptation of Coordinate Frequencies

Authors: Tobias Glasmachers, Ürün Dogan

Abstract: Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies… ▽ More Coordinate descent (CD) algorithms have become the method of choice for solving a number of optimization problems in machine learning. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We consider general CD with non-uniform selection of coordinates. Instead of fixing selection frequencies beforehand we propose an online adaptation mechanism for this important parameter, called the adaptive coordinate frequencies (ACF) method. This mechanism removes the need to estimate optimal coordinate frequencies beforehand, and it automatically reacts to changing requirements during an optimization run. We demonstrate the usefulness of our ACF-CD approach for a variety of optimization problems arising in machine learning contexts. Our algorithm offers significant speed-ups over state-of-the-art training methods. △ Less

Submitted 15 January, 2014; originally announced January 2014.

arXiv:1302.5608 [pdf, ps, other]

Accelerated Linear SVM Training with Adaptive Variable Selection Frequencies

Authors: Tobias Glasmachers, Ürün Dogan

Abstract: Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposit… ▽ More Support vector machine (SVM) training is an active research area since the dawn of the method. In recent years there has been increasing interest in specialized solvers for the important case of linear models. The algorithm presented by Hsieh et al., probably best known under the name of the "liblinear" implementation, marks a major breakthrough. The method is analog to established dual decomposition algorithms for training of non-linear SVMs, but with greatly reduced computational complexity per update step. This comes at the cost of not keeping track of the gradient of the objective any more, which excludes the application of highly developed working set selection algorithms. We present an algorithmic improvement to this method. We replace uniform working set selection with an online adaptation of selection frequencies. The adaptation criterion is inspired by modern second order working set selection methods. The same mechanism replaces the shrinking heuristic. This novel technique speeds up training in some cases by more than an order of magnitude. △ Less

Submitted 22 February, 2013; originally announced February 2013.

Showing 1–17 of 17 results for author: Dogan, U