Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.
翻译:完善的优化方法可以保证短优化地平线的最佳轨道,通常不超过几秒钟。因此,为这一短平线选择最佳轨道仍可能导致一个次优的长远解决办法。与此同时,由此产生的短期轨迹允许在动态交通环境中进行有效、舒适和可行的安全操作。在这项工作中,我们探讨了如何确保最佳的长期驾驶战略,同时保持传统轨迹规划的好处。我们引入了强化学习方法,加上轨迹规划师,学习了高速公路驾驶的最佳长期决策战略。通过在线生成本地最佳动作作为行动,我们平衡了无限的低水平连续行动空间和固定数量的预定标准车道改变行动之间的有限灵活性。我们评估了在开放源交通模拟器SIMO中现实情景的方法,并且能够取得比我们比较的4项基准方法更好的业绩,包括随机选择代理人、贪婪代理人、高水平、离散行动代理人和IDMU控制的代理人。