Search | arXiv e-print repository

Principal-Agent Reinforcement Learning

Authors: Dima Ivanov, Paul Dütting, Inbal Talgam-Cohen, Tonghan Wang, David C. Parkes

Abstract: Contracts are the economic framework which allows a principal to delegate a task to an agent -- despite misaligned interests, and even without directly observing the agent's actions. In many modern reinforcement learning settings, self-interested agents learn to perform a multi-stage task delegated to them by a principal. We explore the significant potential of utilizing contracts to incentivize t… ▽ More Contracts are the economic framework which allows a principal to delegate a task to an agent -- despite misaligned interests, and even without directly observing the agent's actions. In many modern reinforcement learning settings, self-interested agents learn to perform a multi-stage task delegated to them by a principal. We explore the significant potential of utilizing contracts to incentivize the agents. We model the delegated task as an MDP, and study a stochastic game between the principal and agent where the principal learns what contracts to use, and the agent learns an MDP policy in response. We present a learning-based algorithm for optimizing the principal's contracts, which provably converges to the subgame-perfect equilibrium of the principal-agent game. A deep RL implementation allows us to apply our method to very large MDPs with unknown transition dynamics. We extend our approach to multiple agents, and demonstrate its relevance to resolving a canonical sequential social dilemma with minimal intervention to agent rewards. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2406.02077 [pdf, other]

Multi-target stain normalization for histology slides

Authors: Desislav Ivanov, Carlo Alberto Barbano, Marco Grangetto

Abstract: Traditional staining normalization approaches, e.g. Macenko, typically rely on the choice of a single representative reference image, which may not adequately account for the diverse staining patterns of datasets collected in practical scenarios. In this study, we introduce a novel approach that leverages multiple reference images to enhance robustness against stain variation. Our method is parame… ▽ More Traditional staining normalization approaches, e.g. Macenko, typically rely on the choice of a single representative reference image, which may not adequately account for the diverse staining patterns of datasets collected in practical scenarios. In this study, we introduce a novel approach that leverages multiple reference images to enhance robustness against stain variation. Our method is parameter-free and can be adopted in existing computational pathology pipelines with no significant changes. We evaluate the effectiveness of our method through experiments using a deep-learning pipeline for automatic nuclei segmentation on colorectal images. Our results show that by leveraging multiple reference images, better results can be achieved when generalizing to external data, where the staining can widely differ from the training set. △ Less

Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

MSC Class: 68U10 ACM Class: I.4.0

arXiv:2405.07748 [pdf, other]

Neural Network Compression for Reinforcement Learning Tasks

Authors: Dmitry A. Ivanov, Denis A. Larionov, Oleg V. Maslennikov, Vladimir V. Voevodin

Abstract: In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL a… ▽ More In real applications of Reinforcement Learning (RL), such as robotics, low latency and energy efficient inference is very desired. The use of sparsity and pruning for optimizing Neural Network inference, and particularly to improve energy and latency efficiency, is a standard technique. In this work, we perform a systematic investigation of applying these optimization techniques for different RL algorithms in different RL environments, yielding up to a 400-fold reduction in the size of neural networks. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 14 pages, 6 figures

arXiv:2404.14300 [pdf, ps, other]

Linear Search for an Escaping Target with Unknown Speed

Authors: Jared Coleman, Dmitry Ivanov, Evangelos Kranakis, Danny Krizanc, Oscar Morales-Ponce

Abstract: We consider linear search for an escaping target whose speed and initial position are unknown to the searcher. A searcher (an autonomous mobile agent) is initially placed at the origin of the real line and can move with maximum speed $1$ in either direction along the line. An oblivious mobile target that is moving away from the origin with an unknown constant speed $v<1$ is initially placed by an… ▽ More We consider linear search for an escaping target whose speed and initial position are unknown to the searcher. A searcher (an autonomous mobile agent) is initially placed at the origin of the real line and can move with maximum speed $1$ in either direction along the line. An oblivious mobile target that is moving away from the origin with an unknown constant speed $v<1$ is initially placed by an adversary on the infinite line at distance $d$ from the origin in an unknown direction. We consider two cases, depending on whether $d$ is known or unknown. The main contribution of this paper is to prove a new lower bound and give algorithms leading to new upper bounds for search in these settings. This results in an optimal (up to lower order terms in the exponent) competitive ratio in the case where $d$ is known and improved upper and lower bounds for the case where $d$ is unknown. Our results solve an open problem proposed in [Coleman et al., Proc. OPODIS 2022]. △ Less

Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2403.03751 [pdf, other]

Trigram-Based Persistent IDE Indices with Quick Startup

Authors: Zakhar Iakovlev, Alexey Chulkov, Nikita Golikov, Vyacheslav Lukianov, Nikita Zinoviev, Dmitry Ivanov, Vitaly Aksenov

Abstract: One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these fi… ▽ More One common way to speed up the find operation within a set of text files involves a trigram index. This structure is merely a map from a trigram (sequence consisting of three characters) to a set of files which contain it. When searching for a pattern, potential file locations are identified by intersecting the sets related to the trigrams in the pattern. Then, the search proceeds only in these files. However, in a code repository, the trigram index evolves across different versions. Upon checking out a new version, this index is typically built from scratch, which is a time-consuming task, while we want our index to have almost zero-time startup. Thus, we explore the persistent version of a trigram index for full-text and key word patterns search. Our approach just uses the current version of the trigram index and applies only the changes between versions during checkout, significantly enhancing performance. Furthermore, we extend our data structure to accommodate CamelHump search for class and function names. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2401.06514 [pdf, other]

Personalized Reinforcement Learning with a Budget of Policies

Authors: Dmitry Ivanov, Omer Ben-Porat

Abstract: Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users. While this approach has seen success in areas like recommender systems, its expansion into high-stakes fields such as healthcare and autonomous driving is hindered by the extensive regulatory approval processes involved. To address this challenge, we propose a novel framework termed repre… ▽ More Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users. While this approach has seen success in areas like recommender systems, its expansion into high-stakes fields such as healthcare and autonomous driving is hindered by the extensive regulatory approval processes involved. To address this challenge, we propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints. In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies. Our objective is twofold: efficiently match each user to an appropriate representative policy and simultaneously optimize these policies to maximize overall social welfare. We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs. These algorithms draw inspiration from the principles of classic K-means clustering and are underpinned by robust theoretical foundations. Our empirical investigations, conducted across a variety of simulated environments, showcase the algorithms' ability to facilitate meaningful personalization even under constrained policy budgets. Furthermore, they demonstrate scalability, efficiently adapting to larger policy budgets. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024. Code: https://github.com/dimonenka/RL_policy_budget

arXiv:2311.07126 [pdf, other]

How to Do Machine Learning with Small Data? -- A Review from an Industrial Perspective

Authors: Ivan Kraljevski, Yong Chul Ju, Dmitrij Ivanov, Constanze Tschöpe, Matthias Wolff

Abstract: Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving compl… ▽ More Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2307.02318 [pdf, other]

Deep Contract Design via Discontinuous Networks

Authors: Tonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. Parkes

Abstract: Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine funct… ▽ More Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function of the design of a contract where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes. △ Less

Submitted 27 October, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Journal ref: NeurIPS 2023

arXiv:2306.08419 [pdf, other]

doi 10.5555/3545946.3598618

Mediated Multi-Agent Reinforcement Learning

Authors: Dmitry Ivanov, Ilya Zisman, Kirill Chernyshev

Abstract: The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private information. This results in agents that forgo their individual goals in favour of social good, which can potentially be exploited by selfish defectors. We… ▽ More The majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private information. This results in agents that forgo their individual goals in favour of social good, which can potentially be exploited by selfish defectors. We argue that cooperation also requires agents' identities and boundaries to be respected by making sure that the emergent behaviour is an equilibrium, i.e., a convention that no agent can deviate from and receive higher individual payoffs. Inspired by advances in mechanism design, we propose to solve the problem of cooperation, defined as finding socially beneficial equilibrium, by using mediators. A mediator is a benevolent entity that may act on behalf of agents, but only for the agents that agree to it. We show how a mediator can be trained alongside agents with policy gradient to maximize social welfare subject to constraints that encourage agents to cooperate through the mediator. Our experiments in matrix and iterative games highlight the potential power of applying mediators in MARL. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Journal ref: AAMAS '23, Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (May 2023) Pages 49-57

arXiv:2305.10872 [pdf, other]

Benchmark Framework with Skewed Workloads

Authors: Vitaly Aksenov, Dmitry Ivanov, Ravil Galiev

Abstract: In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data structures, i.e., they handle more frequent requests faster, and, thus, should perform better than their standard counterparts. We looked over the commonly used s… ▽ More In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data structures, i.e., they handle more frequent requests faster, and, thus, should perform better than their standard counterparts. We looked over the commonly used suites to test performance of concurrent indices trying to find an inspiration: Synchrobench, Setbench, YCSB, and TPC - and we found several issues with them. The major problem is that they are not flexible: it is difficult to introduce new workloads, it is difficult to set the duration of the experiments, and it is difficult to change the parameters. We decided to solve this issue by presenting a new suite based on Synchrobench. Finally, we highlight the problem of measuring performance of data structures. We show that the relative performance of data structures highly depends on the workload: it is not clear which data structure is best. For that, we take three state-of-the-art concurrent binary search trees and run them on the workloads from our benchmarking suite. As a result, we get six experiments with all possible relative performance of the chosen data structures. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2210.11568 [pdf, ps, other]

Polynomial computational complexity of matrix elements of finite-rank-generated single-particle operators in products of finite bosonic states

Authors: Dmitri A. Ivanov

Abstract: It is known that computing the permanent of the matrix $1+A$, where $A$ is a finite-rank matrix, requires a number of operations polynomial in the matrix size. Motivated by the boson-sampling proposal of restricted quantum computation, I extend this result to a generalization of the matrix permanent: an expectation value in a product of a large number of identical bosonic states with a bounded num… ▽ More It is known that computing the permanent of the matrix $1+A$, where $A$ is a finite-rank matrix, requires a number of operations polynomial in the matrix size. Motivated by the boson-sampling proposal of restricted quantum computation, I extend this result to a generalization of the matrix permanent: an expectation value in a product of a large number of identical bosonic states with a bounded number of bosons. This result complements earlier studies on the computational complexity in boson sampling and related setups. The proposed technique based on the Gaussian averaging is equally applicable to bosonic and fermionic systems. This also allows us to improve an earlier polynomial complexity estimate for the fermionic version of the same problem. △ Less

Submitted 29 May, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 4 pages, introduction and conclusion expanded, minor style corrections

arXiv:2205.13037 [pdf, other]

Neuromorphic Artificial Intelligence Systems

Authors: Dmitry Ivanov, Aleksandr Chezhegov, Andrey Grunin, Mikhail Kiselev, Denis Larionov

Abstract: Modern AI systems, based on von Neumann architecture and classical neural networks, have a number of fundamental limitations in comparison with the brain. This article discusses such limitations and the ways they can be mitigated. Next, it presents an overview of currently available neuromorphic AI projects in which these limitations are overcame by bringing some brain features into the functionin… ▽ More Modern AI systems, based on von Neumann architecture and classical neural networks, have a number of fundamental limitations in comparison with the brain. This article discusses such limitations and the ways they can be mitigated. Next, it presents an overview of currently available neuromorphic AI projects in which these limitations are overcame by bringing some brain features into the functioning and organization of computing systems (TrueNorth, Loihi, Tianjic, SpiNNaker, BrainScaleS, NeuronFlow, DYNAP, Akida). Also, the article presents the principle of classifying neuromorphic AI systems by the brain features they use (neural networks, parallelism and asynchrony, impulse nature of information transfer, local learning, sparsity, analog and in-memory computing). In addition to new architectural approaches used in neuromorphic devices based on existing silicon microelectronics technologies, the article also discusses the prospects of using new memristor element base. Examples of recent advances in the use of memristors in euromorphic applications are also given. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2203.10905 [pdf, other]

Self-Imitation Learning from Demonstrations

Authors: Georgiy Pshikhachev, Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman

Abstract: Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding the agent's exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, w… ▽ More Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding the agent's exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal in realistic scenarios. Modern LfD algorithms require meticulous tuning of hyperparameters that control the influence of demonstrations and, as we show in the paper, struggle with learning from suboptimal demonstrations. To address these issues, we extend Self-Imitation Learning (SIL), a recent RL algorithm that exploits the agent's past good experience, to the LfD setup by initializing its replay buffer with demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). We empirically show that SILfD can learn from demonstrations that are noisy or far from optimal and can automatically adjust the influence of demonstrations throughout the training without additional hyperparameters or handcrafted schedules. We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments, especially when demonstrations are highly suboptimal. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.07206 [pdf, other]

Improving State-of-the-Art in One-Class Classification by Leveraging Unlabeled Data

Authors: Farid Bagirov, Dmitry Ivanov, Aleksei Shpilman

Abstract: When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that… ▽ More When dealing with binary classification of data with only one labeled class data scientists employ two main approaches, namely One-Class (OC) classification and Positive Unlabeled (PU) learning. The former only learns from labeled positive data, whereas the latter also utilizes unlabeled data to improve the overall performance. Since PU learning utilizes more data, we might be prone to think that when unlabeled data is available, the go-to algorithms should always come from the PU group. However, we find that this is not always the case if unlabeled data is unreliable, i.e. contains limited or biased latent negative data. We perform an extensive experimental study of a wide list of state-of-the-art OC and PU algorithms in various scenarios as far as unlabeled data reliability is concerned. Furthermore, we propose PU modifications of state-of-the-art OC algorithms that are robust to unreliable unlabeled data, as well as a guideline to similarly modify other OC algorithms. Our main practical recommendation is to use state-of-the-art PU algorithms when unlabeled data is reliable and to use the proposed modifications of state-of-the-art OC algorithms otherwise. Additionally, we outline procedures to distinguish the cases of reliable and unreliable unlabeled data using statistical tests. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.13110 [pdf, other]

Optimal-er Auctions through Attention

Authors: Dmitry Ivanov, Iskander Safiulin, Igor Filippov, Ksenia Balabaeva

Abstract: RegretNet is a recent breakthrough in the automated design of revenue-maximizing auctions. It combines the flexibility of deep learning with the regret-based approach to relax the Incentive Compatibility (IC) constraint (that participants prefer to bid truthfully) in order to approximate optimal auctions. We propose two independent improvements of RegretNet. The first is a neural architecture deno… ▽ More RegretNet is a recent breakthrough in the automated design of revenue-maximizing auctions. It combines the flexibility of deep learning with the regret-based approach to relax the Incentive Compatibility (IC) constraint (that participants prefer to bid truthfully) in order to approximate optimal auctions. We propose two independent improvements of RegretNet. The first is a neural architecture denoted as RegretFormer that is based on attention layers. The second is a loss function that requires explicit specification of an acceptable IC violation denoted as regret budget. We investigate both modifications in an extensive experimental study that includes settings with constant and inconstant number of items and participants, as well as novel validation procedures tailored to regret-based approaches. We find that RegretFormer consistently outperforms RegretNet in revenue (i.e. is optimal-er) and that our loss function both simplifies hyperparameter tuning and allows to unambiguously control the revenue-regret trade-off by selecting the regret budget. △ Less

Submitted 31 October, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

Comments: NeurIPS 2022

arXiv:2201.02571 [pdf, other]

Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations

Authors: Dmitry Ivanov, Mikhail Kiselev, Denis Larionov

Abstract: This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neu… ▽ More This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved. △ Less

Submitted 7 April, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

arXiv:2112.08882 [pdf, other]

doi 10.2205/2022ES000829

BitTorrent is Apt for Geophysical Data Collection and Distribution

Authors: K. I. Kholodkov, I. M. Aleshin, S. D. Ivanov

Abstract: This article covers a nouveau idea of how to collect and handle geophysical data with a peer-to-peer network in near real-time. The text covers a brief introduction to the cause, the technology, and the particular case of collecting data from GNSS stations. We describe the proof-of-concept implementation that has been tested. The test was conducted with an experimental GNSS station and a data aggr… ▽ More This article covers a nouveau idea of how to collect and handle geophysical data with a peer-to-peer network in near real-time. The text covers a brief introduction to the cause, the technology, and the particular case of collecting data from GNSS stations. We describe the proof-of-concept implementation that has been tested. The test was conducted with an experimental GNSS station and a data aggregation facility. In the test, original raw GNSS signal measurements were transferred to the data aggregation center and subsequently to the consumer. Our implementation utilized BitTorrent to communicate and transfer data. The solution could be used to establish the majority of data aggregation centers activities to provide fast, reliable, and transparent real-time data handling experience to the scientific community. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 13 pages, 2 figures

ACM Class: J.2; C.2.2

arXiv:2103.16511 [pdf, other]

Flatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid World

Authors: Florian Laurent, Manuel Schneider, Christian Scheller, Jeremy Watson, Jiaoyang Li, Zhe Chen, Yi Zheng, Shao-Hung Chan, Konstantin Makhnev, Oleg Svidchenko, Vladimir Egorov, Dmitry Ivanov, Aleksei Shpilman, Evgenija Spirovska, Oliver Tanevski, Aleksandar Nikov, Ramon Grunder, David Galevski, Jakov Mitrovski, Guillaume Sartoretti, Zhiyao Luo, Mehul Damani, Nilabha Bhattacharya, Shivam Agarwal, Adrian Egli , et al. (2 additional authors not shown)

Abstract: The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing com… ▽ More The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing complexity of modern railway networks makes dynamic real-time scheduling of traffic virtually impossible. Recently, multi-agent reinforcement learning (MARL) has successfully tackled challenging tasks where many agents need to be coordinated, such as multiplayer video games. However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner. Submissions had to bring as many trains (agents) to their target stations in as little time as possible. While the best submissions were in the OR category, participants found many promising MARL approaches. Using both centralized and decentralized learning based approaches, top submissions used graph representations of the environment to construct tree-based observations. Further, different coordination mechanisms were implemented, such as communication and prioritization between agents. This paper presents the competition setup, four outstanding solutions to the competition, and a cross-comparison between them. △ Less

Submitted 30 March, 2021; originally announced March 2021.

Comments: 28 pages, 8 figures

arXiv:2102.12307 [pdf, other]

Balancing Rational and Other-Regarding Preferences in Cooperative-Competitive Environments

Authors: Dmitry Ivanov, Vladimir Egorov, Aleksei Shpilman

Abstract: Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to achieve and maintain cooperation. A preva… ▽ More Recent reinforcement learning studies extensively explore the interplay between cooperative and competitive behaviour in mixed environments. Unlike cooperative environments where agents strive towards a common goal, mixed environments are notorious for the conflicts of selfish and social interests. As a consequence, purely rational agents often struggle to achieve and maintain cooperation. A prevalent approach to induce cooperative behaviour is to assign additional rewards based on other agents' well-being. However, this approach suffers from the issue of multi-agent credit assignment, which can hinder performance. This issue is efficiently alleviated in cooperative setting with such state-of-the-art algorithms as QMIX and COMA. Still, when applied to mixed environments, these algorithms may result in unfair allocation of rewards. We propose BAROCCO, an extension of these algorithms capable to balance individual and social incentives. The mechanism behind BAROCCO is to train two distinct but interwoven components that jointly affect each agent's decisions. Our meta-algorithm is compatible with both Q-learning and Actor-Critic frameworks. We experimentally confirm the advantages over the existing methods and explore the behavioural aspects of BAROCCO in two mixed multi-agent setups. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: Short version of this paper is accepted to AAMAS 2021

arXiv:1904.06069 [pdf, other]

doi 10.1103/PhysRevA.101.012303

Complexity of full counting statistics of free quantum particles in product states

Authors: Dmitri A. Ivanov, Leonid Gurvits

Abstract: We study the computational complexity of quantum-mechanical expectation values of single-particle operators in bosonic and fermionic multi-particle product states. Such expectation values appear, in particular, in full-counting-statistics problems. Depending on the initial multi-particle product state, the expectation values may be either easy to compute (the required number of operations scales p… ▽ More We study the computational complexity of quantum-mechanical expectation values of single-particle operators in bosonic and fermionic multi-particle product states. Such expectation values appear, in particular, in full-counting-statistics problems. Depending on the initial multi-particle product state, the expectation values may be either easy to compute (the required number of operations scales polynomially with the particle number) or hard to compute (at least as hard as a permanent of a matrix). However, if we only consider full counting statistics in a finite number of final single-particle states, then the full-counting-statistics generating function becomes easy to compute in all the analyzed cases. We prove the latter statement for the general case of the fermionic product state and for the single-boson product state (the same as used in the boson-sampling proposal). This result may be relevant for using multi-particle product states as a resource for quantum computing. △ Less

Submitted 21 February, 2020; v1 submitted 12 April, 2019; originally announced April 2019.

Comments: 8 pages, published version

Journal ref: Phys. Rev. A 101, 012303 (2020)

arXiv:1902.06965 [pdf, other]

DEDPUL: Difference-of-Estimated-Densities-based Positive-Unlabeled Learning

Authors: Dmitry Ivanov

Abstract: Positive-Unlabeled (PU) learning is an analog to supervised binary classification for the case when only the positive sample is clean, while the negative sample is contaminated with latent instances of positive class and hence can be considered as an unlabeled mixture. The objectives are to classify the unlabeled sample and train an unbiased PN classifier, which generally requires to identify the… ▽ More Positive-Unlabeled (PU) learning is an analog to supervised binary classification for the case when only the positive sample is clean, while the negative sample is contaminated with latent instances of positive class and hence can be considered as an unlabeled mixture. The objectives are to classify the unlabeled sample and train an unbiased PN classifier, which generally requires to identify the mixing proportions of positives and negatives first. Recently, unbiased risk estimation framework has achieved state-of-the-art performance in PU learning. This approach, however, exhibits two major bottlenecks. First, the mixing proportions are assumed to be identified, i.e. known in the domain or estimated with additional methods. Second, the approach relies on the classifier being a neural network. In this paper, we propose DEDPUL, a method that solves PU Learning without the aforementioned issues. The mechanism behind DEDPUL is to apply a computationally cheap post-processing procedure to the predictions of any classifier trained to distinguish positive and unlabeled data. Instead of assuming the proportions to be identified, DEDPUL estimates them alongside with classifying unlabeled sample. Experiments show that DEDPUL outperforms the current state-of-the-art in both proportion estimation and PU Classification. △ Less

Submitted 7 June, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

Comments: Implementation of DEDPUL and experimental data are available at https://github.com/dimonenka/DEDPUL

arXiv:1603.02724 [pdf, other]

doi 10.1103/PhysRevA.96.012322

Computational complexity of exterior products and multi-particle amplitudes of non-interacting fermions in entangled states

Authors: Dmitri A. Ivanov

Abstract: Noninteracting bosons were proposed to be used for a demonstration of quantum-computing supremacy in a boson-sampling setup. A similar demonstration with fermions would require that the fermions are initially prepared in an entangled state. I suggest that pairwise entanglement of fermions would be sufficient for this purpose. Namely, it is shown that computing multi-particle scattering amplitudes… ▽ More Noninteracting bosons were proposed to be used for a demonstration of quantum-computing supremacy in a boson-sampling setup. A similar demonstration with fermions would require that the fermions are initially prepared in an entangled state. I suggest that pairwise entanglement of fermions would be sufficient for this purpose. Namely, it is shown that computing multi-particle scattering amplitudes for fermions entangled pairwise in groups of four single-particle states is #P hard. In linear algebra, such amplitudes are expressed as exterior products of two-forms of rank two. In particular, a permanent of a NxN matrix may be expressed as an exterior product of N^2 two-forms of rank two in dimension 2N^2, which establishes the #P-hardness of the latter. △ Less

Submitted 6 August, 2017; v1 submitted 8 March, 2016; originally announced March 2016.

Comments: 5 pages, version accepted for publication

Journal ref: Phys. Rev. A 96, 012322 (2017)

arXiv:1008.0063 [pdf]

Evolutionary Approach to Test Generation for Functional BIST

Authors: Y. A. Skobtsov, D. E. Ivanov, V. Y. Skobtsov, R. Ubar, J. Raik

Abstract: In the paper, an evolutionary approach to test generation for functional BIST is considered. The aim of the proposed scheme is to minimize the test data volume by allowing the device's microprogram to test its logic, providing an observation structure to the system, and generating appropriate test data for the given architecture. Two methods of deriving a deterministic test set at functional level… ▽ More In the paper, an evolutionary approach to test generation for functional BIST is considered. The aim of the proposed scheme is to minimize the test data volume by allowing the device's microprogram to test its logic, providing an observation structure to the system, and generating appropriate test data for the given architecture. Two methods of deriving a deterministic test set at functional level are suggested. The first method is based on the classical genetic algorithm with binary and arithmetic crossover and mutation operators. The second one uses genetic programming, where test is represented as a sequence of microoperations. In the latter case, we apply two-point crossover based on exchanging test subsequences and mutation implemented as random replacement of microoperations or operands. Experimental data of the program realization showing the efficiency of the proposed methods are presented. △ Less

Submitted 31 July, 2010; originally announced August 2010.

Comments: 10 European Test Symposium. Informal Digest of Papers

Showing 1–23 of 23 results for author: Ivanov, D