In this work, we present a novel Reinforcement Learning (RL) algorithm for the off-road trajectory tracking problem. Off-road environments involve varying terrain types and elevations, and it is difficult to model the interaction dynamics of specific off-road vehicles with such a diverse and complex environment. Standard RL policies trained on a simulator will fail to operate in such challenging real-world settings. Instead of using a naive domain randomization approach, we propose an innovative supervised-learning based approach for overcoming the sim-to-real gap problem. Our approach efficiently exploits the limited real-world data available to adapt the baseline RL policy obtained using a simple kinematics simulator. This avoids the need for modeling the diverse and complex interaction of the vehicle with off-road environments. We evaluate the performance of the proposed algorithm using two different off-road vehicles, Warthog and Moose. Compared to the standard ILQR approach, our proposed approach achieves a 30% and 50% reduction in cross track error in Warthog and Moose, respectively, by utilizing only 30 minutes of real-world driving data.
翻译:在这项工作中,我们提出了一个新的越野轨道跟踪问题强化学习(RL)算法。越野环境涉及不同的地形类型和高度,很难模拟特定越野车辆与如此多样和复杂的环境的互动动态。在模拟器上经过培训的标准RL政策将无法在如此具有挑战性的现实世界环境中运作。我们不采用天真的域间随机化方法,而是提出一种创新的有监督的学习方法,以克服模拟到现实的差距问题。我们的方法有效地利用有限的现实世界数据来调整使用简单的运动模拟器获得的基线RL政策。这避免了对车辆与越野环境的多样化和复杂互动进行模型化的需要。我们用两种不同的越野车辆(Warthog和Moose)来评估拟议的算法的性能。与标准的ILQR方法相比,我们拟议的方法通过只使用30分钟的地貌驱动数据,分别使Wartho和Moose的跨轨差减少了30%和50%。