Route planning is essential to mobile robot navigation problems. In recent years, deep reinforcement learning (DRL) has been applied to learning optimal planning policies in stochastic environments without prior knowledge. However, existing works focus on learning policies that maximize the expected return, the performance of which can vary greatly when the level of stochasticity in the environment is high. In this work, we propose a distributional reinforcement learning based framework that learns return distributions which explicitly reflect environmental stochasticity. Policies based on the second-order stochastic dominance (SSD) relation can be used to make adjustable route decisions according to user preference on performance robustness. Our proposed method is evaluated in a simulated road network environment, and experimental results show that our method is able to plan the shortest routes that minimize stochasticity in travel time when robustness is preferred, while other state-of-the-art DRL methods are agnostic to environmental stochasticity.
翻译:路线规划是移动机器人导航问题中不可或缺的一部分。近年来,深度强化学习在没有先前知识的随机环境中应用于学习最优规划策略。然而,现有作品集中于学习最大化期望收益的策略,当环境中的随机性水平较高时,其表现可以大大变化。在这项工作中,我们提出了一种基于分布式强化学习的框架,该框架学习回报分布,明确反映环境随机性。基于二阶随机优势(SSD)关系的策略可用于根据用户对性能稳健性的偏好做出可调整的路线决策。我们的提出的方法在一个模拟的道路网络环境中进行了评估,实验结果表明,我们的方法能够规划最短路线,最大限度地减小旅行时间的随机性,同时其他先进的DRL方法对环境随机性是不感知的。