Deep Reinforcement Learning (DRL) is hugely successful due to the availability of realistic simulated environments. However, performance degradation during simulation to real-world transfer still remains a challenging problem for the policies trained in simulated environments. To close this sim-to-real gap, we present a novel hybrid architecture that utilizes an intermediate output from a fully trained attention DRL policy as a navigation cost map for outdoor navigation. Our attention DRL network incorporates a robot-centric elevation map, IMU data, the robot's pose, previous actions, and goal information as inputs to compute a navigation cost-map that highlights non-traversable regions. We compute least-cost waypoints on the cost map and utilize the Dynamic Window Approach (DWA) with velocity constraints on high cost regions to follow the waypoints in highly uneven outdoor environments. Our formulation generates dynamically feasible velocities along stable, traversable regions to reach the robot's goals. We observe an increase of 5% in terms of success rate, 13.09% of the decrease in average robot vibration, and a 19.33% reduction in average velocity compared to end-to-end DRL method and state-of-the-art methods in complex outdoor environments. We evaluate the benefits of our method using a Clearpath Husky robot in both simulated and real-world uneven environments.
翻译:深度强化学习(DRL) 之所以非常成功,是因为有现实的模拟环境。 然而,模拟到真实世界转移期间的性能退化仍然是模拟环境中所培训的政策的一个棘手问题。为了缩小这种模拟到现实的差距,我们提出了一个新型混合结构,将经过充分训练的DRL关注政策的中间产出作为户外导航的导航成本图。我们的注意力DRL网络包含以机器人为中心的高地地图、IMU数据、机器人的构成、先前的动作和目标信息,作为计算导航成本图以突出不可移动区域的投入。我们在成本图上计算成本最低的路径,并利用动态窗口方法(DWA),对成本较高的区域进行高速限制,以便在高度不均匀的室外环境中遵循路标。我们的设计在稳定、可移动的区域产生了动态可行的速度,以达到机器人的目标。我们观察到成功率提高了5%,平均机器人振动率下降了13.09 %,平均机器人振动率下降了19.33%,同时,我们用更清楚的机器人方法对中程、更清晰的机器人环境进行了平均速度进行了评估。