Autonomous driving in multi-agent dynamic traffic scenarios is challenging: the behaviors of road users are uncertain and are hard to model explicitly, and the ego-vehicle should apply complicated negotiation skills with them, such as yielding, merging and taking turns, to achieve both safe and efficient driving in various settings. Traditional planning methods are largely rule-based and scale poorly in these complex dynamic scenarios, often leading to reactive or even overly conservative behaviors. Therefore, they require tedious human efforts to maintain workability. Recently, deep learning-based methods have shown promising results with better generalization capability but less hand engineering efforts. However, they are either implemented with supervised imitation learning (IL), which suffers from dataset bias and distribution mismatch issues, or are trained with deep reinforcement learning (DRL) but focus on one specific traffic scenario. In this work, we propose DQ-GAT to achieve scalable and proactive autonomous driving, where graph attention-based networks are used to implicitly model interactions, and deep Q-learning is employed to train the network end-to-end in an unsupervised manner. Extensive experiments in a high-fidelity driving simulator show that our method achieves higher success rates than previous learning-based methods and a traditional rule-based method, and better trades off safety and efficiency in both seen and unseen scenarios. Moreover, qualitative results on a trajectory dataset indicate that our learned policy can be transferred to the real world for practical applications with real-time speeds. Demonstration videos are available at https://caipeide.github.io/dq-gat/.
翻译:多试剂动态交通情景中的自主驱动具有挑战性:道路使用者的行为不确定,难以明确模型,自我驾驶者应当与他们运用复杂的谈判技能,如产出、合并和旋转,以便在各种环境下实现安全、高效驾驶。传统规划方法在复杂的动态情景中基本上以规则为基础,规模不高,往往导致反应性甚至过于保守的行为。因此,它们需要人为维持可操作性而作的乏味努力。最近,深层次的学习方法已经显示出有希望的结果,其普及能力较强,但手动工程努力较少。然而,这些方法要么是经过监督的模拟学习(IL)来实施,这种学习具有数据集偏差和分布不匹配问题,或者经过深入强化学习(DRL)训练,但侧重于一种特定的交通情景。在这项工作中,我们建议DQ-GAT来实现可缩放和主动的自主驱动性驱动力,其中图形式网络被用来隐含模式互动,而深层次的Q-学习方法可以以未经监督的方式对网络端对终端/终端进行培训。在高端进行广泛的模拟模拟学习,在高端进行广泛的模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟模拟的实验中进行广泛的实验,在引导上进行大规模操作上展示,在复制政策上展示,并展示了我们以往的学习方法,在学习成功学习,在学习方法上显示成功,在学习成功。