We present a novel sensor-based learning navigation algorithm to compute a collision-free trajectory for a robot in dense and dynamic environments with moving obstacles or targets. Our approach uses deep reinforcement learning-based expert policy that is trained using a sim2real paradigm. In order to increase the reliability and handle the failure cases of the expert policy, we combine with a policy extraction technique to transform the resulting policy into a decision tree format. The resulting decision tree has properties which we use to analyze and modify the policy and improve performance on navigation metrics including smoothness, frequency of oscillation, frequency of immobilization, and obstruction of target. We are able to modify the policy to address these imperfections without retraining, combining the learning power of deep learning with the control of domain-specific algorithms. We highlight the benefits of our algorithm in simulated environments and navigating a Clearpath Jackal robot among moving pedestrians. (Videos at this url: https://gamma.umd.edu/researchdirections/xrl/navviper)
翻译:我们提出了一个新型的基于传感器的学习导航算法,用于计算在密集和动态环境中、有移动障碍或目标的机器人的无碰撞轨迹。我们的方法使用了深强化学习为基础的专家政策,该政策是使用模拟模式培训的。为了提高可靠性和处理专家政策的失败案例,我们结合了一种政策提取技术,将由此产生的政策转化为决策树格式。由此产生的决策树具有一些特性,我们用来分析和修改政策,改进导航指标的性能,包括滑动、振荡频率、振荡频率、振荡频率和阻碍目标。我们能够修改政策,在不进行再培训的情况下解决这些不完善之处,将深层学习的力量与特定区域算法的控制结合起来。我们强调我们在模拟环境中的算法的好处,并在移动行人中行走清晰的杰克机器人。 (此处的虚拟:https://gamma.umd.edu/researdictionention/xrl/navviper)