Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, through directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a SMDP with continuous action space, Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. We evaluate our approach both in the simulator and the real world.
翻译:深强化学习(DRL)算法已证明在机器人导航中行之有效,特别是在未知环境中,方法是直接将感知输入绘图到机器人控制命令中,然而,大多数现有方法忽略了当地在导航中的最低问题,因此无法处理复杂的未知环境。在本文中,我们建议采用第一个DRL导航方法,由具有连续行动空间的SMDP模型,即适应性前向模拟时间(AFST)来解决这个问题。具体地说,我们通过修改其GAE来更好地估计SMDP的政策梯度,改进了特定SMDP问题的分布性准政策优化(DPO)算法(DPPO)以更好地估计SMDP的政策梯度。我们在模拟器和现实世界中都评估了我们的方法。