Autonomous racing is becoming popular for academic and industry researchers as a test for general autonomous driving by pushing perception, planning, and control algorithms to their limits. While traditional control methods such as MPC are capable of generating an optimal control sequence at the edge of the vehicles physical controllability, these methods are sensitive to the accuracy of the modeling parameters. This paper presents TC-Driver, a RL approach for robust control in autonomous racing. In particular, the TC-Driver agent is conditioned by a trajectory generated by any arbitrary traditional high-level planner. The proposed TC-Driver addresses the tire parameter modeling inaccuracies by exploiting the heuristic nature of RL while leveraging the reliability of traditional planning methods in a hierarchical control structure. We train the agent under varying tire conditions, allowing it to generalize to different model parameters, aiming to increase the racing capabilities of the system in practice. The proposed RL method outperforms a non-learning-based MPC with a 2.7 lower crash ratio in a model mismatch setting, underlining robustness to parameter discrepancies. In addition, the average RL inference duration is 0.25 ms compared to the average MPC solving time of 11.5 ms, yielding a nearly 40-fold speedup, allowing for complex control deployment in computationally constrained devices. Lastly, we show that the frequently utilized end-to-end RL architecture, as a control policy directly learned from sensory input, is not well suited to model mismatch robustness nor track generalization. Our realistic simulations show that TC-Driver achieves a 6.7 and 3-fold lower crash ratio under model mismatch and track generalization settings, while simultaneously achieving lower lap times than an end-to-end approach, demonstrating the viability of TC-driver to robust autonomous racing.
翻译:对学术和产业研究人员来说,自主赛越来越受学术界和行业研究人员的欢迎,这是通过推动感知、规划和控制算法来测试通用自主驾驶的测试。虽然像MPC这样的传统控制方法能够产生车辆物理可控性边缘的最佳控制序列,但这些方法对模型参数的准确性十分敏感。本文展示了TC-Driver,这是在自动赛中进行强力控制的RL方法。特别是,TC-Driver代理商受到任意的传统高现实计划师产生的轨迹的制约,而这种轨迹则是由任何高水平的任意性传统高现实规划师产生的。拟议的TC-Driver 处理轮胎参数模型不准确性模型,利用RL的超常性性质,同时利用等级控制结构的传统规划方法的可靠性。我们在不同轮胎条件下对代理商进行培训,以便将其推广到不同的模型参数,目的是提高系统在实际操作中的赛跑能力。 拟议的RL方法比不学习模式的MPC更低的崩溃率比率,强调精确性与参数的差。此外,RL的平均R-L型模型在使用时间上比我们正常的运行速度规则,在运行期间显示我们最接近的进度到最接近的 RC,在速度的逻辑上显示一个速度到最接近的 RC的飞行的飞行的进度,在比我们最慢的逻辑到最快的飞行的逻辑到最快的计算。