Recovering Permuted Sequential Features for effective Reinforcement Learning

Yi Jiang; Mingxiao Feng; Wengang Zhou; Houqiang Li

doi:10.1016/j.neunet.2024.106795

Recovering Permuted Sequential Features for effective Reinforcement Learning

Neural Netw. 2024 Nov 13:182:106795. doi: 10.1016/j.neunet.2024.106795. Online ahead of print.

Authors

Yi Jiang¹, Mingxiao Feng², Wengang Zhou³, Houqiang Li⁴

Affiliations

¹ EEIS Department, University of Science and Technology of China, Hefei, 230026, Anhui, China. Electronic address: [email protected].
² EEIS Department, University of Science and Technology of China, Hefei, 230026, Anhui, China. Electronic address: [email protected].
³ EEIS Department, University of Science and Technology of China, Hefei, 230026, Anhui, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230071, Anhui, China. Electronic address: [email protected].
⁴ EEIS Department, University of Science and Technology of China, Hefei, 230026, Anhui, China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, 230071, Anhui, China. Electronic address: [email protected].

PMID: 39549491
DOI: 10.1016/j.neunet.2024.106795

Abstract

When applying Reinforcement Learning (RL) to the real-world visual tasks, two major challenges necessitate consideration: sample inefficiency and limited generalization. To address the above two challenges, previous works focus primarily on learning semantic information from the visual state for improving sample efficiency, but they do not explicitly learn other valuable aspects, such as spatial information. Moreover, they improve generalization by learning representations that are invariant to alterations of task-irrelevant variables, without considering task-relevant variables. To enhance sample efficiency and generalization of the base RL algorithm in visual tasks, we propose an auxiliary task called Recovering Permuted Sequential Features (RPSF). Our method enhances generalization by learning the spatial structure information of the agent, which can mitigate the effects of changes in both task-relevant and task-irrelevant variables. Moreover, it explicitly learns both semantic and spatial information from the visual state by disordering and subsequently recovering a sequence of features to generate more holistic representations, thereby improving sample efficiency. Extensive experiments demonstrate that our method significantly improves the sample efficiency and generalization of the base RL algorithm and outperforms various state-of-the-art baselines across diverse tasks in unseen environments. Furthermore, our method exhibits compatibility with both CNN and Transformer architectures.

Keywords: Generalization; Reinforcement Learning; Representation learning; Sample efficiency.