We present a novel approach to improve the performance of deep reinforcement learning (DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on carefully designed dense reward functions that learn the efficient behavior in an environment. We circumvent this issue by working only with sparse rewards (which are easy to design), and propose a novel adaptive Heavy-Tailed Reinforce algorithm for Outdoor Navigation called HTRON. Our main idea is to utilize heavy-tailed policy parametrizations which implicitly induce exploration in sparse reward settings. We evaluate the performance of HTRON against Reinforce, PPO and TRPO algorithms in three different outdoor scenarios: goal-reaching, obstacle avoidance, and uneven terrain navigation. We observe in average an increase of 34.41% in terms of success rate, a 15.15% decrease in the average time steps taken to reach the goal, and a 24.9% decrease in the elevation cost compared to the navigation policies obtained by the other methods. Further, we demonstrate that our algorithm can be transferred directly into a Clearpath Husky robot to perform outdoor terrain navigation in real-world scenarios.
翻译:我们提出了一个改进深强化学习(DRL)室外机器人导航系统绩效的新办法。大多数现有的DRL方法都基于精心设计的密集奖励功能,这些功能可以学习环境中的有效行为。我们绕过这一问题,只与微薄的奖励(容易设计)合作,并提议了一种新型的室内导航适应性重力强化算法,称为HTRON。我们的主要想法是使用重尾政策平衡法,这间接促使在稀薄的奖励环境中进行探索。我们评估了HTRON对SEGE、PPPO和TRPO算法在三种不同室外情景(目标直达、障碍避免和不均匀的地形导航)的绩效。我们观察到,从成功率来看,平均增长34.41%,为实现目标而平均缩短了15.15%,相对于其他方法获得的导航政策而言,海拔成本下降了24.9%。此外,我们证明,我们的算法可以直接转换为Clearpath Husky机器人,在现实世界情景中进行户外地形导航。