-
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Authors:
Alexander Nikulin,
Ilya Zisman,
Alexey Zemtsov,
Viacheslav Sinii,
Vladislav Kurenkov,
Sergey Kolesnikov
Abstract:
Following the success of the in-context learning paradigm in large-scale language and computer vision models, the recently emerging field of in-context reinforcement learning is experiencing a rapid growth. However, its development has been held back by the lack of challenging benchmarks, as all the experiments have been carried out in simple environments and on small-scale datasets. We present \t…
▽ More
Following the success of the in-context learning paradigm in large-scale language and computer vision models, the recently emerging field of in-context reinforcement learning is experiencing a rapid growth. However, its development has been held back by the lack of challenging benchmarks, as all the experiments have been carried out in simple environments and on small-scale datasets. We present \textbf{XLand-100B}, a large-scale dataset for in-context reinforcement learning based on the XLand-MiniGrid environment, as a first step to alleviate this problem. It contains complete learning histories for nearly $30,000$ different tasks, covering $100$B transitions and $2.5$B episodes. It took $50,000$ GPU hours to collect the dataset, which is beyond the reach of most academic labs. Along with the dataset, we provide the utilities to reproduce or expand it even further. With this substantial effort, we aim to democratize research in the rapidly growing field of in-context reinforcement learning and provide a solid foundation for further scaling. The code is open-source and available under Apache 2.0 licence at https://github.com/dunno-lab/xland-minigrid-datasets.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
In-Context Reinforcement Learning for Variable Action Spaces
Authors:
Viacheslav Sinii,
Alexander Nikulin,
Vladislav Kurenkov,
Ilya Zisman,
Sergey Kolesnikov
Abstract:
Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly f…
▽ More
Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.
△ Less
Submitted 1 July, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Emergence of In-Context Reinforcement Learning from Noise Distillation
Authors:
Ilya Zisman,
Vladislav Kurenkov,
Alexander Nikulin,
Viacheslav Sinii,
Sergey Kolesnikov
Abstract:
Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD…
▽ More
Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
△ Less
Submitted 12 June, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
Authors:
Alexander Nikulin,
Vladislav Kurenkov,
Ilya Zisman,
Artem Agarkov,
Viacheslav Sinii,
Sergey Kolesnikov
Abstract:
Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with…
▽ More
Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.
△ Less
Submitted 10 June, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Unveiling Empirical Pathologies of Laplace Approximation for Uncertainty Estimation
Authors:
Maksim Zhdanov,
Stanislav Dereka,
Sergey Kolesnikov
Abstract:
In this paper, we critically evaluate Bayesian methods for uncertainty estimation in deep learning, focusing on the widely applied Laplace approximation and its variants. Our findings reveal that the conventional method of fitting the Hessian matrix negatively impacts out-of-distribution (OOD) detection efficiency. We propose a different point of view, asserting that focusing solely on optimizing…
▽ More
In this paper, we critically evaluate Bayesian methods for uncertainty estimation in deep learning, focusing on the widely applied Laplace approximation and its variants. Our findings reveal that the conventional method of fitting the Hessian matrix negatively impacts out-of-distribution (OOD) detection efficiency. We propose a different point of view, asserting that focusing solely on optimizing prior precision can yield more accurate uncertainty estimates in OOD detection while preserving adequate calibration metrics. Moreover, we demonstrate that this property is not connected to the training stage of a model but rather to its intrinsic properties. Through extensive experimental evaluation, we establish the superiority of our simplified approach over traditional methods in the out-of-distribution domain.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Wild-Tab: A Benchmark For Out-Of-Distribution Generalization In Tabular Regression
Authors:
Sergey Kolesnikov
Abstract:
Out-of-Distribution (OOD) generalization, a cornerstone for building robust machine learning models capable of handling data diverging from the training set's distribution, is an ongoing challenge in deep learning. While significant progress has been observed in computer vision and natural language processing, its exploration in tabular data, ubiquitous in many industrial applications, remains nas…
▽ More
Out-of-Distribution (OOD) generalization, a cornerstone for building robust machine learning models capable of handling data diverging from the training set's distribution, is an ongoing challenge in deep learning. While significant progress has been observed in computer vision and natural language processing, its exploration in tabular data, ubiquitous in many industrial applications, remains nascent. To bridge this gap, we present Wild-Tab, a large-scale benchmark tailored for OOD generalization in tabular regression tasks. The benchmark incorporates 3 industrial datasets sourced from fields like weather prediction and power consumption estimation, providing a challenging testbed for evaluating OOD performance under real-world conditions. Our extensive experiments, evaluating 10 distinct OOD generalization methods on Wild-Tab, reveal nuanced insights. We observe that many of these methods often struggle to maintain high-performance levels on unseen data, with OOD performance showing a marked drop compared to in-distribution performance. At the same time, Empirical Risk Minimization (ERM), despite its simplicity, delivers robust performance across all evaluations, rivaling the results of state-of-the-art methods. Looking forward, we hope that the release of Wild-Tab will facilitate further research on OOD generalization and aid in the deployment of machine learning models in various real-world contexts where handling distribution shifts is a crucial requirement.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Time-Aware Item Weighting for the Next Basket Recommendations
Authors:
Aleksey Romanov,
Oleg Lashinin,
Marina Ananyeva,
Sergey Kolesnikov
Abstract:
In this paper we study the next basket recommendation problem. Recent methods use different approaches to achieve better performance. However, many of them do not use information about the time of prediction and time intervals between baskets. To fill this gap, we propose a novel method, Time-Aware Item-based Weighting (TAIW), which takes timestamps and intervals into account. We provide experimen…
▽ More
In this paper we study the next basket recommendation problem. Recent methods use different approaches to achieve better performance. However, many of them do not use information about the time of prediction and time intervals between baskets. To fill this gap, we propose a novel method, Time-Aware Item-based Weighting (TAIW), which takes timestamps and intervals into account. We provide experiments on three real-world datasets, and TAIW outperforms well-tuned state-of-the-art baselines for next-basket recommendations. In addition, we show the results of an ablation study and a case study of a few items.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
RecBaselines2023: a new dataset for choosing baselines for recommender models
Authors:
Veronika Ivanova,
Oleg Lashinin,
Marina Ananyeva,
Sergey Kolesnikov
Abstract:
The number of proposed recommender algorithms continues to grow. The authors propose new approaches and compare them with existing models, called baselines. Due to the large number of recommender models, it is difficult to estimate which algorithms to choose in the article. To solve this problem, we have collected and published a dataset containing information about the recommender models used in…
▽ More
The number of proposed recommender algorithms continues to grow. The authors propose new approaches and compare them with existing models, called baselines. Due to the large number of recommender models, it is difficult to estimate which algorithms to choose in the article. To solve this problem, we have collected and published a dataset containing information about the recommender models used in 903 papers, both as baselines and as proposed approaches. This dataset can be seen as a typical dataset with interactions between papers and previously proposed models. In addition, we provide a descriptive analysis of the dataset and highlight possible challenges to be investigated with the data. Furthermore, we have conducted extensive experiments using a well-established methodology to build a good recommender algorithm under the dataset. Our experiments show that the selection of the best baselines for proposing new recommender approaches can be considered and successfully solved by existing state-of-the-art collaborative filtering models. Finally, we discuss limitations and future work.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Katakomba: Tools and Benchmarks for Data-Driven NetHack
Authors:
Vladislav Kurenkov,
Alexander Nikulin,
Denis Tarasov,
Sergey Kolesnikov
Abstract:
NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack datase…
▽ More
NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: resource-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.
△ Less
Submitted 26 October, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Diversifying Deep Ensembles: A Saliency Map Approach for Enhanced OOD Detection, Calibration, and Accuracy
Authors:
Stanislav Dereka,
Ivan Karpukhin,
Maksim Zhdanov,
Sergey Kolesnikov
Abstract:
Deep ensembles are capable of achieving state-of-the-art results in classification and out-of-distribution (OOD) detection. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency Diversified Deep Ensemble (SDDE), a novel approach that promotes diversity among ensemble members by leveraging saliency…
▽ More
Deep ensembles are capable of achieving state-of-the-art results in classification and out-of-distribution (OOD) detection. However, their effectiveness is limited due to the homogeneity of learned patterns within ensembles. To overcome this issue, our study introduces Saliency Diversified Deep Ensemble (SDDE), a novel approach that promotes diversity among ensemble members by leveraging saliency maps. Through incorporating saliency map diversification, our method outperforms conventional ensemble techniques and improves calibration in multiple classification and OOD detection tasks. In particular, the proposed method achieves state-of-the-art OOD detection quality, calibration, and accuracy on multiple benchmarks, including CIFAR10/100 and large-scale ImageNet datasets.
△ Less
Submitted 14 June, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Authors:
Denis Tarasov,
Vladislav Kurenkov,
Alexander Nikulin,
Sergey Kolesnikov
Abstract:
Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices o…
▽ More
Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.
△ Less
Submitted 24 October, 2023; v1 submitted 16 May, 2023;
originally announced May 2023.
-
Anti-Exploration by Random Network Distillation
Authors:
Alexander Nikulin,
Vladislav Kurenkov,
Denis Tarasov,
Sergey Kolesnikov
Abstract:
Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively min…
▽ More
Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
△ Less
Submitted 17 May, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Authors:
Dmitriy Akimov,
Vladislav Kurenkov,
Alexander Nikulin,
Denis Tarasov,
Sergey Kolesnikov
Abstract:
Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these p…
▽ More
Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.
△ Less
Submitted 30 January, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Authors:
Alexander Nikulin,
Vladislav Kurenkov,
Denis Tarasov,
Dmitry Akimov,
Sergey Kolesnikov
Abstract:
Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for…
▽ More
Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 3-4x times on average.
△ Less
Submitted 30 January, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
CORL: Research-oriented Deep Offline Reinforcement Learning Library
Authors:
Denis Tarasov,
Alexander Nikulin,
Dmitry Akimov,
Vladislav Kurenkov,
Sergey Kolesnikov
Abstract:
CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easie…
▽ More
CORL is an open-source library that provides thoroughly benchmarked single-file implementations of both deep offline and offline-to-online reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into separate single files, making performance-relevant details easier to recognize. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking commonly employed D4RL datasets providing a transparent source of results that can be reused for robust evaluation tools such as performance profiles, probability of improvement, or expected online performance.
△ Less
Submitted 26 October, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Deep Image Retrieval is not Robust to Label Noise
Authors:
Stanislav Dereka,
Ivan Karpukhin,
Sergey Kolesnikov
Abstract:
Large-scale datasets are essential for the success of deep learning in image retrieval. However, manual assessment errors and semi-supervised annotation techniques can lead to label noise even in popular datasets. As previous works primarily studied annotation quality in image classification tasks, it is still unclear how label noise affects deep learning approaches to image retrieval. In this wor…
▽ More
Large-scale datasets are essential for the success of deep learning in image retrieval. However, manual assessment errors and semi-supervised annotation techniques can lead to label noise even in popular datasets. As previous works primarily studied annotation quality in image classification tasks, it is still unclear how label noise affects deep learning approaches to image retrieval. In this work, we show that image retrieval methods are less robust to label noise than image classification ones. Furthermore, we, for the first time, investigate different types of label noise specific to image retrieval tasks and study their effect on model performance.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
EXACT: How to Train Your Accuracy
Authors:
Ivan Karpukhin,
Stanislav Dereka,
Sergey Kolesnikov
Abstract:
Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected…
▽ More
Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.
△ Less
Submitted 24 July, 2024; v1 submitted 19 May, 2022;
originally announced May 2022.
-
CVTT: Cross-Validation Through Time
Authors:
Mikhail Andronov,
Sergey Kolesnikov
Abstract:
The evaluation of recommender systems from a practical perspective is a topic of ongoing discourse within the research community. While many current evaluation methods reduce performance to a single value metric as an easy way to compare models, it relies on the assumption that the methods' performance remains constant over time. In this study, we examine this assumption and propose the Cross-Vali…
▽ More
The evaluation of recommender systems from a practical perspective is a topic of ongoing discourse within the research community. While many current evaluation methods reduce performance to a single value metric as an easy way to compare models, it relies on the assumption that the methods' performance remains constant over time. In this study, we examine this assumption and propose the Cross-Validation Thought Time (CVTT) technique as a more comprehensive evaluation method, focusing on model performance over time. By utilizing the proposed technique, we conduct an in-depth analysis of the performance of popular RecSys algorithms. Our findings indicate that (1) the performance of the recommenders varies over time for all reviewed datasets, (2) using simple evaluation approaches can lead to a substantial decrease in performance in real-world evaluation scenarios, and (3) excessive data usage can lead to suboptimal results.
△ Less
Submitted 10 February, 2023; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Probabilistic Embeddings Revisited
Authors:
Ivan Karpukhin,
Stanislav Dereka,
Sergey Kolesnikov
Abstract:
In recent years, deep metric learning and its probabilistic extensions claimed state-of-the-art results in the face verification task. Despite improvements in face verification, probabilistic methods received little attention in the research community and practical applications. In this paper, we, for the first time, perform an in-depth analysis of known probabilistic methods in verification and r…
▽ More
In recent years, deep metric learning and its probabilistic extensions claimed state-of-the-art results in the face verification task. Despite improvements in face verification, probabilistic methods received little attention in the research community and practical applications. In this paper, we, for the first time, perform an in-depth analysis of known probabilistic methods in verification and retrieval tasks. We study different design choices and propose a simple extension, achieving new state-of-the-art results among probabilistic methods. Finally, we study confidence prediction and show that it correlates with data quality, but contains little information about prediction error probability. We thus provide a new confidence evaluation benchmark and establish a baseline for future confidence prediction research. PyTorch implementation is publicly released.
△ Less
Submitted 10 November, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Next Period Recommendation Reality Check
Authors:
Sergey Kolesnikov,
Oleg Lashinin,
Michail Pechatov,
Alexander Kosov
Abstract:
Over the past decade, tremendous progress has been made in Recommender Systems (RecSys) for well-known tasks such as next-item and next-basket prediction. On the other hand, the recently proposed next-period recommendation (NPR) task is not covered as much. Current works about NPR are mostly based around distinct problem formulations, methods, and proprietary datasets, making solutions difficult t…
▽ More
Over the past decade, tremendous progress has been made in Recommender Systems (RecSys) for well-known tasks such as next-item and next-basket prediction. On the other hand, the recently proposed next-period recommendation (NPR) task is not covered as much. Current works about NPR are mostly based around distinct problem formulations, methods, and proprietary datasets, making solutions difficult to reproduce. In this article, we aim to fill the gap in RecSys methods evaluation on the NPR task using publicly available datasets and (1) introduce the TTRS, a large-scale financial transactions dataset suitable for RecSys methods evaluation; (2) benchmark popular RecSys approaches on several datasets for the NPR task. When performing our analysis, we found a strong repetitive consumption pattern in several real-world datasets. With this setup, our results suggest that the repetitive nature of data is still hard to generalize for the evaluated RecSys methods, and novel item prediction performance is still questionable.
△ Less
Submitted 20 December, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
Authors:
Vladislav Kurenkov,
Sergey Kolesnikov
Abstract:
In this work, we argue for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms. First, we delineate that the online evaluation budget is problem-dependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budget-dependent across a diverse range of decision-making domains su…
▽ More
In this work, we argue for the importance of an online evaluation budget for a reliable comparison of deep offline RL algorithms. First, we delineate that the online evaluation budget is problem-dependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budget-dependent across a diverse range of decision-making domains such as Robotics, Finance, and Energy Management. Following the points above, we suggest reporting the performance of deep offline RL algorithms under varying online evaluation budgets. To facilitate this, we propose to use a reporting tool from the NLP field, Expected Validation Performance. This technique makes it possible to reliably estimate expected maximum performance under different budgets while not requiring any additional computation beyond hyperparameter search. By employing this tool, we also show that Behavioral Cloning is often more favorable to offline RL algorithms when working within a limited budget.
△ Less
Submitted 5 June, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
LRWR: Large-Scale Benchmark for Lip Reading in Russian language
Authors:
Evgeniy Egorov,
Vasily Kostyumov,
Mikhail Konyk,
Sergey Kolesnikov
Abstract:
Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages: so far, these methods have been focused only on English or Chinese. In this paper, we introduce a naturally dist…
▽ More
Lipreading, also known as visual speech recognition, aims to identify the speech content from videos by analyzing the visual deformations of lips and nearby areas. One of the significant obstacles for research in this field is the lack of proper datasets for a wide variety of languages: so far, these methods have been focused only on English or Chinese. In this paper, we introduce a naturally distributed large-scale benchmark for lipreading in Russian language, named LRWR, which contains 235 classes and 135 speakers. We provide a detailed description of the dataset collection pipeline and dataset statistics. We also present a comprehensive comparison of the current popular lipreading methods on LRWR and conduct a detailed analysis of their performance. The results demonstrate the differences between the benchmarked languages and provide several promising directions for lipreading models finetuning. Thanks to our findings, we also achieved new state-of-the-art results on the LRW benchmark.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Sample Efficient Ensemble Learning with Catalyst.RL
Authors:
Sergey Kolesnikov,
Valentin Khrulkov
Abstract:
We present Catalyst.RL, an open-source PyTorch framework for reproducible and sample efficient reinforcement learning (RL) research. Main features of Catalyst.RL include large-scale asynchronous distributed training, efficient implementations of various RL algorithms and auxiliary tricks, such as n-step returns, value distributions, hyperbolic reinforcement learning, etc. To demonstrate the effect…
▽ More
We present Catalyst.RL, an open-source PyTorch framework for reproducible and sample efficient reinforcement learning (RL) research. Main features of Catalyst.RL include large-scale asynchronous distributed training, efficient implementations of various RL algorithms and auxiliary tricks, such as n-step returns, value distributions, hyperbolic reinforcement learning, etc. To demonstrate the effectiveness of Catalyst.RL, we applied it to a physics-based reinforcement learning challenge "NeurIPS 2019: Learn to Move -- Walk Around" with the objective to build a locomotion controller for a human musculoskeletal model. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. Our team took the 2nd place, capitalizing on the ability of Catalyst.RL to train high-quality and sample-efficient RL agents in only a few hours of training time. The implementation along with experiments is open-sourced so results can be reproduced and novel ideas tried out.
△ Less
Submitted 7 April, 2020; v1 submitted 29 March, 2020;
originally announced March 2020.
-
Catalyst.RL: A Distributed Framework for Reproducible RL Research
Authors:
Sergey Kolesnikov,
Oleksii Hrinchuk
Abstract:
Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training…
▽ More
Despite the recent progress in deep reinforcement learning field (RL), and, arguably because of it, a large body of work remains to be done in reproducing and carefully comparing different RL algorithms. We present catalyst.RL, an open source framework for RL research with a focus on reproducibility and flexibility. Main features of our library include large-scale asynchronous distributed training, easy-to-use configuration files with the complete list of hyperparameters for the particular experiments, efficient implementations of various RL algorithms and auxiliary tricks, such as frame stacking, n-step returns, value distributions, etc. To vindicate the usefulness of our framework, we evaluate it on a range of benchmarks in a continuous control, as well as on the task of developing a controller to enable a physiologically-based human model with a prosthetic leg to walk and run. The latter task was introduced at NeurIPS 2018 AI for Prosthetics Challenge, where our team took the 3rd place, capitalizing on the ability of catalyst.RL to train high-quality and sample-efficient RL agents.
△ Less
Submitted 28 February, 2019;
originally announced March 2019.
-
Artificial Intelligence for Prosthetics - challenge solutions
Authors:
Łukasz Kidziński,
Carmichael Ong,
Sharada Prasanna Mohanty,
Jennifer Hicks,
Sean F. Carroll,
Bo Zhou,
Hongsheng Zeng,
Fan Wang,
Rongzhong Lian,
Hao Tian,
Wojciech Jaśkowski,
Garrett Andersen,
Odd Rune Lykkebø,
Nihat Engin Toklu,
Pranav Shyam,
Rupesh Kumar Srivastava,
Sergey Kolesnikov,
Oleksii Hrinchuk,
Anton Pechenko,
Mattias Ljungström,
Zhen Wang,
Xu Hu,
Zehong Hu,
Minghui Qiu,
Jun Huang
, et al. (25 additional authors not shown)
Abstract:
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many s…
▽ More
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments
Authors:
Łukasz Kidziński,
Sharada Prasanna Mohanty,
Carmichael Ong,
Zhewei Huang,
Shuchang Zhou,
Anton Pechenko,
Adam Stelmaszczyk,
Piotr Jarosik,
Mikhail Pavlov,
Sergey Kolesnikov,
Sergey Plis,
Zhibo Chen,
Zhizheng Zhang,
Jiale Chen,
Jun Shi,
Zhuobin Zheng,
Chun Yuan,
Zhihui Lin,
Henryk Michalewski,
Piotr Miłoś,
Błażej Osiński,
Andrew Melnik,
Malte Schilling,
Helge Ritter,
Sean Carroll
, et al. (4 additional authors not shown)
Abstract:
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient…
▽ More
In the NIPS 2017 Learning to Run challenge, participants were tasked with building a controller for a musculoskeletal model to make it run as fast as possible through an obstacle course. Top participants were invited to describe their algorithms. In this work, we present eight solutions that used deep reinforcement learning approaches, based on algorithms such as Deep Deterministic Policy Gradient, Proximal Policy Optimization, and Trust Region Policy Optimization. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each of the eight teams implemented different modifications of the known algorithms.
△ Less
Submitted 1 April, 2018;
originally announced April 2018.
-
On the Relation of External and Internal Feature Interactions: A Case Study
Authors:
Sergiy Kolesnikov,
Norbert Siegmund,
Christian Kästner,
Sven Apel
Abstract:
Detecting feature interactions is imperative for accurately predicting performance of highly-configurable systems. State-of-the-art performance prediction techniques rely on supervised machine learning for detecting feature interactions, which, in turn, relies on time consuming performance measurements to obtain training data. By providing information about potentially interacting features, we can…
▽ More
Detecting feature interactions is imperative for accurately predicting performance of highly-configurable systems. State-of-the-art performance prediction techniques rely on supervised machine learning for detecting feature interactions, which, in turn, relies on time consuming performance measurements to obtain training data. By providing information about potentially interacting features, we can reduce the number of required performance measurements and make the overall performance prediction process more time efficient. We expect that the information about potentially interacting features can be obtained by statically analyzing the source code of a highly-configurable system, which is computationally cheaper than performing multiple performance measurements. To this end, we conducted a qualitative case study in which we explored the relation between control-flow feature interactions (detected through static program analysis) and performance feature interactions (detected by performance prediction techniques using performance measurements). We found that a relation exists, which can potentially be exploited to predict performance interactions.
△ Less
Submitted 22 January, 2018; v1 submitted 20 December, 2017;
originally announced December 2017.
-
Run, skeleton, run: skeletal model in a physics-based simulation
Authors:
Mikhail Pavlov,
Sergey Kolesnikov,
Sergey M. Plis
Abstract:
In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods…
▽ More
In this paper, we present our approach to solve a physics-based reinforcement learning challenge "Learning to Run" with objective to train physiologically-based human model to navigate a complex obstacle course as quickly as possible. The environment is computationally expensive, has a high-dimensional continuous action space and is stochastic. We benchmark state of the art policy-gradient methods and test several improvements, such as layer normalization, parameter noise, action and state reflecting, to stabilize training and improve its sample-efficiency. We found that the Deep Deterministic Policy Gradient method is the most efficient method for this environment and the improvements we have introduced help to stabilize training. Learned models are able to generalize to new physical scenarios, e.g. different obstacle courses.
△ Less
Submitted 28 January, 2018; v1 submitted 18 November, 2017;
originally announced November 2017.