Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations

Chao Li; Shaokang Dong; Shangdong Yang; Yujing Hu; Tianyu Ding; Wenbin Li; Yang Gao

doi:10.1109/TNNLS.2024.3475216

Multi-Task Multi-Agent Reinforcement Learning With Interaction and Task Representations

IEEE Trans Neural Netw Learn Syst. 2024 Oct 30:PP. doi: 10.1109/TNNLS.2024.3475216. Online ahead of print.

Authors

Chao Li, Shaokang Dong, Shangdong Yang, Yujing Hu, Tianyu Ding, Wenbin Li, Yang Gao

PMID: 39475745
DOI: 10.1109/TNNLS.2024.3475216

Abstract

Multi-task multi-agent reinforcement learning (MT-MARL) is capable of leveraging useful knowledge across multiple related tasks to improve performance on any single task. While recent studies have tentatively achieved this by learning independent policies on a shared representation space, we pinpoint that further advancements can be realized by explicitly characterizing agent interactions within these multi-agent tasks and identifying task relations for selective reuse. To this end, this article proposes Representing Interactions and Tasks (RIT), a novel MT-MARL algorithm that characterizes both intra-task agent interactions and inter-task task relations. Specifically, for characterizing agent interactions, RIT presents the interactive value decomposition to explicitly take the dependency among agents into policy learning. Theoretical analysis demonstrates that the learned utility value of each agent approximates its Shapley value, thus representing agent interactions. Moreover, we learn task representations based on per-agent local trajectories, which assess task similarities and accordingly identify task relations. As a result, RIT facilitates the effective transfer of interaction knowledge across similar multi-agent tasks. Structurally, RIT develops universal policy structure for scalable multi-task policy learning. We evaluate RIT against multiple state-of-the-art baselines in various cooperative tasks, and its significant performance under both multi-task and zero-shot settings demonstrates its effectiveness.