Autonomous driving in multi-agent and dynamic traffic scenarios is challenging, where the behaviors of other road agents are uncertain and hard to model explicitly, and the ego-vehicle should apply complicated negotiation skills with them to achieve both safe and efficient driving in various settings, such as giving way, merging and taking turns. Traditional planning methods are largely rule-based and scale poorly in these complex dynamic scenarios, often leading to reactive or even overly conservative behaviors. Therefore, they require tedious human efforts to maintain workability. Recently, deep learning-based methods have shown promising results with better generalization capability but less hand engineering effort. However, they are either implemented with supervised imitation learning (IL) that suffers from the dataset bias and distribution mismatch problems, or trained with deep reinforcement learning (DRL) but focus on one specific traffic scenario. In this work, we propose DQ-GAT to achieve scalable and proactive autonomous driving, where graph attention-based networks are used to implicitly model interactions, and asynchronous deep Q-learning is employed to train the network end-to-end in an unsupervised manner. Extensive experiments through a high-fidelity driving simulation show that our method can better trade-off safety and efficiency in both seen and unseen scenarios, achieving higher goal success rates than the baselines (at most 4.7$\times$) with comparable task completion time. Demonstration videos are available at https://caipeide.github.io/dq-gat/.
翻译:在多试剂和动态交通情况中自主驾驶是具有挑战性的,因为其他道路物剂的行为不确定,难以明确模型,而且难以明确模型,而自我驾驶者应运用复杂的谈判技能,与他们一起在诸如让路、合并和轮流等各种环境下实现安全和高效驾驶,传统规划方法基本上基于规则,规模在复杂的动态假设中较差,往往导致反应性或甚至过于保守的行为。因此,它们需要人为保持可工作能力而作出虚伪的努力。最近,深层次的基于学习的方法显示了有希望的结果,提高了普及能力,但减少了手动工程努力。然而,这些方法的实施要么是在受监督的模拟学习(IL)中采用复杂的谈判技能,而这种学习又受到数据集偏差和分布错配问题的影响,或者在深度强化学习(DRL)中加以培训,但侧重于一种特定的交通情况。在这项工作中,我们建议DQ-GAT实现可缩放和主动的自主驾驶,其中以图表为基础的网络被用来隐含模式互动,并且使用不固定的深层次的Q学习方法来以不受控制的方式对网络的端端端到端端端端进行训练。但是,在高额/高额上,在高额的汇率上进行广泛的试验,在高额的、高额、高额的、高额的滚式的模拟、高额的滚式的、高额的模拟上,在高额的滚式、高额的模拟、高额、高额的模拟、高额、低级的模拟、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级、低级