Reinforcement learning has received high research interest for developing planning approaches in automated driving. Most prior works consider the end-to-end planning task that yields direct control commands and rarely deploy their algorithm to real vehicles. In this work, we propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning. By populating an abstract objective interface, established motion planning algorithms can be leveraged, which derive smooth and drivable trajectories. Given the current environment model, we propose to use a built-in simulator to predict the traffic scene for a given horizon into the future. The behavior of automated vehicles in mixed traffic is determined by querying the learned policy. To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner, and as such lacks a state-of-the-art benchmark. Thus, we validate the proposed approach by comparing an idealistic single-shot plan with cyclic replanning through the learned policy. Experiments with a real testing vehicle on proving grounds demonstrate the potential of our approach to shrink the simulation to real world gap of deep reinforcement learning based planning approaches. Additional simulative analyses reveal that more complex multi-agent maneuvers can be managed by employing the cycling replanning approach.
翻译:强化学习已经引起了人们在自动驾驶领域中开发规划方法的高度兴趣。大部分先前的研究考虑了产生直接控制指令的端到端的计划任务,并且很少将算法部署到真实车辆。在本研究中,我们提出了一种方法来利用训练有素的深度强化学习策略进行高级别的行为规划。通过设置抽象的目标接口,已建立的运动规划算法可以得到利用,这些算法可以产生平滑且可驾驶的轨迹。鉴于当前的环境模型,我们提出使用内置的模拟器预测给定未来视野内的交通情况。自动驾驶汽车在混合交通中的行为是通过查询学习策略来决定的。据我们所知,本文是第一个以这种方式应用深度强化学习的研究,并且缺乏最先进的基准。因此,我们通过将理想的单次计划与通过学习策略进行周期性重规划进行比较来验证所提出的方法。采用实际测试车辆在证明场地上的实验验证了我们方法的潜力,可减小基于深度强化学习的规划方法在模拟与实际场景之间的差距。附加的模拟分析显示,通过采用循环重新规划方法,可以管理更复杂的多智能体操作。