Autonomous driving has a natural bi-level structure. The goal of the upper behavioural layer is to provide appropriate lane change, speeding up, and braking decisions to optimize a given driving task. However, this layer can only indirectly influence the driving efficiency through the lower-level trajectory planner, which takes in the behavioural inputs to produce motion commands. Existing sampling-based approaches do not fully exploit the strong coupling between the behavioural and planning layer. On the other hand, end-to-end Reinforcement Learning (RL) can learn a behavioural layer while incorporating feedback from the lower-level planner. However, purely data-driven approaches often fail in safety metrics in unseen environments. This paper presents a novel alternative; a parameterized bi-level optimization that jointly computes the optimal behavioural decisions and the resulting downstream trajectory. Our approach runs in real-time using a custom GPU-accelerated batch optimizer, and a Conditional Variational Autoencoder learnt warm-start strategy. Extensive simulations show that our approach outperforms state-of-the-art model predictive control and RL approaches in terms of collision rate while being competitive in driving efficiency.
翻译:自动驾驶具有自然的双层结构。 上层行为层的目标是提供适当的车道变化、 加速和制动决定以优化特定驾驶任务。 但是, 该层只能通过低层轨道规划师间接地影响驾驶效率, 该规划师吸收行为投入来生成运动指令。 现有的抽样方法没有充分利用行为层与规划层之间的紧密结合。 另一方面, 端到端强化学习( RL) 可以学习一个行为层, 同时吸收下层计划师的反馈。 但是, 纯数据驱动的方法往往在隐蔽环境中的安全度指标中失灵。 本文提出了一个新的备选方案; 参数化的双层优化, 共同计算最佳行为决定和由此形成的下游轨迹。 我们的方法在实时运行, 使用定制的 GPU- 加速的批量优化, 和条件性自动电离子学习的热启动战略。 广泛的模拟显示, 我们的方法在驱动效率方面超越了状态式模型预测控制和 RL 方法。