Pathfinding in stochastic environments: learning vs planning

Alexey Skrynnik; Anton Andreychuk; Konstantin Yakovlev; Aleksandr Panov

doi:10.7717/peerj-cs.1056

Pathfinding in stochastic environments: learning vs planning

PeerJ Comput Sci. 2022 Aug 18:8:e1056. doi: 10.7717/peerj-cs.1056. eCollection 2022.

Authors

Alexey Skrynnik^{1

2

3}, Anton Andreychuk², Konstantin Yakovlev^{2

3}, Aleksandr Panov^{1

3}

Affiliations

¹ Cognitive Dynamic Systems, Moscow Institute of Physics and Technology, Moscow, Russia.
² Artificial Intelligence Research Institute AIRI, Moscow, Russia.
³ Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, Moscow, Russia.

Abstract

Among the main challenges associated with navigating a mobile robot in complex environments are partial observability and stochasticity. This work proposes a stochastic formulation of the pathfinding problem, assuming that obstacles of arbitrary shapes may appear and disappear at random moments of time. Moreover, we consider the case when the environment is only partially observable for an agent. We study and evaluate two orthogonal approaches to tackle the problem of reaching the goal under such conditions: planning and learning. Within planning, an agent constantly re-plans and updates the path based on the history of the observations using a search-based planner. Within learning, an agent asynchronously learns to optimize a policy function using recurrent neural networks (we propose an original efficient, scalable approach). We carry on an extensive empirical evaluation of both approaches that show that the learning-based approach scales better to the increasing number of the unpredictably appearing/disappearing obstacles. At the same time, the planning-based one is preferable when the environment is close-to-the-deterministic (i.e., external disturbances are rare). Code available at https://github.com/Tviskaron/pathfinding-in-stochastic-envs.

Keywords: Asynchronous learning; Path finding; Policy optimization; Reinforcement learning; Stochastic A*.

Grants and funding

This work was supported by the Ministry of Science and Higher Education of the Russian Federation under Project 075-15-2020-799. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.