基于强化学习的复杂驾驶情景安全等级规划框架 (A Safe Hierarchical Planning Framework for Complex Driving Scenarios based on Reinforcement Learning)

Autonomous vehicles need to handle various traffic conditions and make safe and efficient decisions and maneuvers. However, on the one hand, a single optimization/sampling-based motion planner cannot efficiently generate safe trajectories in real time, particularly when there are many interactive vehicles near by. On the other hand, end-to-end learning methods cannot assure the safety of the outcomes. To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers. Safety is guaranteed by the low-level optimization/sampling-based controllers, while the high-level reinforcement learning algorithm makes H-CtRL an adaptive and efficient behavior planner. To train and test our proposed algorithm, we built a simulator that can reproduce traffic scenes using real-world datasets. The proposed H-CtRL is proved to be effective in various realistic simulation scenarios, with satisfying performance in terms of both safety and efficiency.

翻译：自主车辆需要处理各种交通条件,作出安全有效的决定和操作。然而,一方面,单一的优化/抽样运动规划仪无法有效实时生成安全轨道,特别是附近有许多交互式车辆。另一方面,端到端学习方法不能保证结果的安全。为了应对这一挑战,我们提议了一个等级行为规划框架,配有一套低级安全控制器和高级别强化学习算法(H-CtRL),作为低级控制器的协调员。安全由低级优化/抽样控制器保证,而高级强化学习算法使H-CtRL成为适应性和效率强的行为规划器。为了培训和测试我们提议的算法,我们建立了一个模拟器,可以用真实世界的数据集复制交通场景。拟议的H-CtRL在各种现实的模拟假设中证明是有效的,在安全和效率方面都令人满意。