Search | arXiv e-print repository

Variational Offline Multi-agent Skill Discovery

Authors: Jiayu Chen, Bhargav Ganguly, Tian Lan, Vaneet Aggarwal

Abstract: Skills are effective temporal abstractions established for sequential decision making tasks, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In… ▽ More Skills are effective temporal abstractions established for sequential decision making tasks, which enable efficient hierarchical learning for long-horizon tasks and facilitate multi-task learning through their transferability. Despite extensive research, research gaps remain in multi-agent scenarios, particularly for automatically extracting subgroup coordination patterns in a multi-agent task. In this case, we propose two novel auto-encoder schemes: VO-MASD-3D and VO-MASD-Hier, to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills, which firstly solves the aforementioned challenge. An essential algorithm component of these schemes is a dynamic grouping function that can automatically detect latent subgroups based on agent interactions in a task. Notably, our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining. Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing methods regarding applying skills in multi-agent reinforcement learning (MARL). Moreover, skills discovered using our method can effectively reduce the learning difficulty in MARL scenarios with delayed and sparse reward signals. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2402.13777 [pdf, other]

Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Authors: Jiayu Chen, Bhargav Ganguly, Yang Xu, Yongsheng Mei, Tian Lan, Vaneet Aggarwal

Abstract: Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline… ▽ More Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. In this paper, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Notably, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms. For convenience, we maintain a paper list on https://github.com/LucasCJYSDL/DGMs-for-Offline-Policy-Learning. △ Less

Submitted 25 May, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: We restructured the paper and added more discussion

arXiv:2310.11684 [pdf, other]

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

Authors: Bhargav Ganguly, Yang Xu, Vaneet Aggarwal

Abstract: This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent's engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm… ▽ More This paper investigates the potential of quantum acceleration in addressing infinite horizon Markov Decision Processes (MDPs) to enhance average reward outcomes. We introduce an innovative quantum framework for the agent's engagement with an unknown MDP, extending the conventional interaction paradigm. Our approach involves the design of an optimism-driven tabular Reinforcement Learning algorithm that harnesses quantum signals acquired by the agent through efficient quantum mean estimation techniques. Through thorough theoretical analysis, we demonstrate that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm achieves a regret bound of $\tilde{\mathcal{O}}(1)$, a significant improvement over the $\tilde{\mathcal{O}}(\sqrt{T})$ bound exhibited by classical counterparts. △ Less

Submitted 28 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2302.08617 [pdf, other]

Quantum Computing Provides Exponential Regret Improvement in Episodic Reinforcement Learning

Authors: Bhargav Ganguly, Yulian Wu, Di Wang, Vaneet Aggarwal

Abstract: In this paper, we investigate the problem of \textit{episodic reinforcement learning} with quantum oracles for state evolution. To this end, we propose an \textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP. Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts, achieving a… ▽ More In this paper, we investigate the problem of \textit{episodic reinforcement learning} with quantum oracles for state evolution. To this end, we propose an \textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP. Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts, achieving a regret of $\Tilde{\mathcal{O}}(1)$ as compared to $\Tilde{\mathcal{O}}(\sqrt{K})$ \footnote{$\Tilde{\mathcal{O}}(\cdot)$ hides logarithmic terms.}, $K$ being the number of training episodes. In order to achieve this advantage, we exploit efficient quantum mean estimation technique that provides quadratic improvement in the number of i.i.d. samples needed to estimate the mean of sub-Gaussian random variables as compared to classical mean estimation. This improvement is a key to the significant regret improvement in quantum reinforcement learning. We provide proof-of-concept experiments on various RL environments that in turn demonstrate performance gains of the proposed algorithmic framework. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2211.12578 [pdf, other]

Online Federated Learning via Non-Stationary Detection and Adaptation amidst Concept Drift

Authors: Bhargav Ganguly, Vaneet Aggarwal

Abstract: Federated Learning (FL) is an emerging domain in the broader context of artificial intelligence research. Methodologies pertaining to FL assume distributed model training, consisting of a collection of clients and a server, with the main goal of achieving optimal global model with restrictions on data sharing due to privacy concerns. It is worth highlighting that the diverse existing literature in… ▽ More Federated Learning (FL) is an emerging domain in the broader context of artificial intelligence research. Methodologies pertaining to FL assume distributed model training, consisting of a collection of clients and a server, with the main goal of achieving optimal global model with restrictions on data sharing due to privacy concerns. It is worth highlighting that the diverse existing literature in FL mostly assume stationary data generation processes; such an assumption is unrealistic in real-world conditions where concept drift occurs due to, for instance, seasonal or period observations, faults in sensor measurements. In this paper, we introduce a multiscale algorithmic framework which combines theoretical guarantees of \textit{FedAvg} and \textit{FedOMD} algorithms in near stationary settings with a non-stationary detection and adaptation technique to ameliorate FL generalization performance in the presence of concept drifts. We present a multi-scale algorithmic framework leading to $\Tilde{\mathcal{O}} ( \min \{ \sqrt{LT} , Δ^{\frac{1}{3}}T^{\frac{2}{3}} + \sqrt{T} \})$ \textit{dynamic regret} for $T$ rounds with an underlying general convex loss function, where $L$ is the number of times non-stationary drifts occurred and $Δ$ is the cumulative magnitude of drift experienced within $T$ rounds. △ Less

Submitted 6 May, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

arXiv:2203.13950 [pdf, other]

doi 10.1109/TNET.2023.3262482

Multi-Edge Server-Assisted Dynamic Federated Learning with an Optimized Floating Aggregation Point

Authors: Bhargav Ganguly, Seyyedali Hosseinalipour, Kwang Taik Kim, Christopher G. Brinton, Vaneet Aggarwal, David J. Love, Mung Chiang

Abstract: We propose cooperative edge-assisted dynamic federated learning (CE-FL). CE-FL introduces a distributed machine learning (ML) architecture, where data collection is carried out at the end devices, while the model training is conducted cooperatively at the end devices and the edge servers, enabled via data offloading from the end devices to the edge servers through base stations. CE-FL also introdu… ▽ More We propose cooperative edge-assisted dynamic federated learning (CE-FL). CE-FL introduces a distributed machine learning (ML) architecture, where data collection is carried out at the end devices, while the model training is conducted cooperatively at the end devices and the edge servers, enabled via data offloading from the end devices to the edge servers through base stations. CE-FL also introduces floating aggregation point, where the local models generated at the devices and the servers are aggregated at an edge server, which varies from one model training round to another to cope with the network evolution in terms of data distribution and users' mobility. CE-FL considers the heterogeneity of network elements in terms of communication/computation models and the proximity to one another. CE-FL further presumes a dynamic environment with online variation of data at the network devices which causes a drift at the ML model performance. We model the processes taken during CE-FL, and conduct analytical convergence analysis of its ML model training. We then formulate network-aware CE-FL which aims to adaptively optimize all the network elements via tuning their contribution to the learning process, which turns out to be a non-convex mixed integer problem. Motivated by the large scale of the system, we propose a distributed optimization solver to break down the computation of the solution across the network elements. We finally demonstrate the effectiveness of our framework with the data collected from a real-world testbed. △ Less

Submitted 22 October, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

Journal ref: Published in IEEE/ACM Transactions on Networking, 2023

arXiv:2102.10740 [pdf, other]

Communication Efficient Parallel Reinforcement Learning

Authors: Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal

Abstract: We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions using reinforcement learning for $T$ rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each a… ▽ More We consider the problem where $M$ agents interact with $M$ identical and independent environments with $S$ states and $A$ actions using reinforcement learning for $T$ rounds. The agents share their data with a central server to minimize their regret. We aim to find an algorithm that allows the agents to minimize the regret with infrequent communication rounds. We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde{O}(DS\sqrt{MAT})$ for a Markov Decision Process with diameter $D$, number of states $S$, and number of actions $A$. The agents synchronize after their visitations to any state-action pair exceeds a certain threshold. Using this, we obtain a bound of $O\left(MSA\log(MT)\right)$ on the total number of communications rounds. Finally, we evaluate the algorithm against multiple environments and demonstrate that the proposed algorithm performs at par with an always communication version of the UCRL2 algorithm, while with significantly lower communication. △ Less

Submitted 21 February, 2021; originally announced February 2021.

arXiv:1812.11316 [pdf]

Microcontroller Based Robotic Arm Development for Library Management System

Authors: Bodhisatwa Barma, Samrat Ghosh, Abhrodip Chaudhury, Biswarup Ganguly

Abstract: With the advancement of robotics, automation in various industries and processes has become widespread. This project aims to introduce library automation system, which addresses the fulfillment of the objectives of automatic retrieval of queued books, arrangement of returned books on the racks as well as automated updating of the library database. The proposed system is based on the Arduino microc… ▽ More With the advancement of robotics, automation in various industries and processes has become widespread. This project aims to introduce library automation system, which addresses the fulfillment of the objectives of automatic retrieval of queued books, arrangement of returned books on the racks as well as automated updating of the library database. The proposed system is based on the Arduino microcontroller and python programming Microcontroller based robotic arms are used to fetch books from or return books to the different shelves in the library. The library database is also updated after completion of an action. The uniqueness of the proposed system lies in the fact that it can be applied to any existing library and is capable of handling individual books rather than a bulk. The system aims to bring new dimensions to the concept of library automation. △ Less

Submitted 29 December, 2018; originally announced December 2018.

Comments: Accepted in 2nd International Conference on Computational Advancement in Communication circuit and System (ICCACCS-2018). Best Paper award in "Poster Presentation Category"

arXiv:1202.0862 [pdf, other]

e-Valuate: A Two-player Game on Arithmetic Expressions -- An Update

Authors: Sarang Aravamuthan, Biswajit Ganguly

Abstract: e-Valuate is a game on arithmetic expressions. The players have contrasting roles of maximizing and minimizing the given expression. The maximizer proposes values and the minimizer substitutes them for variables of his choice. When the expression is fully instantiated, its value is compared with a certain minimax value that would result if the players played to their optimal strategies. The winner… ▽ More e-Valuate is a game on arithmetic expressions. The players have contrasting roles of maximizing and minimizing the given expression. The maximizer proposes values and the minimizer substitutes them for variables of his choice. When the expression is fully instantiated, its value is compared with a certain minimax value that would result if the players played to their optimal strategies. The winner is declared based on this comparison. We use a game tree to represent the state of the game and show how the minimax value can be computed efficiently using backward induction and alpha-beta pruning. The efficacy of alpha-beta pruning depends on the order in which the nodes are evaluated. Further improvements can be obtained by using transposition tables to prevent reevaluation of the same nodes. We propose a heuristic for node ordering. We show how the use of the heuristic and transposition tables lead to improved performance by comparing the number of nodes pruned by each method. We describe some domain-specific variants of this game. The first is a graph theoretic formulation wherein two players share a set of elements of a graph by coloring a related set with each player looking to maximize his share. The set being shared could be either the set of vertices, edges or faces (for a planar graph). An application of this is the sharing of regions enclosed by a planar graph where each player's aim is to maximize the area of his share. Another variant is a tiling game where the players alternately place dominoes on a $8 \times 8$ checkerboard to construct a maximal partial tiling. We show that the size of the tiling $x$ satisfies $22 \le x \le 32$ by proving that any maximal partial tiling requires at least $22$ dominoes. △ Less

Submitted 12 September, 2014; v1 submitted 3 February, 2012; originally announced February 2012.

Comments: 18 pages, 3 figures

MSC Class: 91A05; 91A43 (Primary); 91A46 (Secondary)

Showing 1–9 of 9 results for author: Ganguly, B