Currently, urban autonomous driving remains challenging because of the complexity of the driving environment. Learning-based approaches, such as reinforcement learning (RL) and imitation learning (IL), have indicated superiority over rule-based approaches, showing great potential to make decisions intelligently, but they still do not work well in urban driving situations. To better tackle this problem, this paper proposes a novel learning-based method that combines deep reinforcement learning with expert demonstrations, focusing on longitudinal motion control in autonomous driving. Our proposed method employs the soft actor-critic structure and modifies the learning process of the policy network to incorporate both the goals of maximizing reward and imitating the expert. Moreover, an adaptive prioritized experience replay is designed to sample experience from both the agent's self-exploration and expert demonstration, in order to improve the sample efficiency. The proposed method is validated in a simulated urban roundabout scenario and compared with various prevailing RL and IL baseline approaches. The results manifest that the proposed method has a faster training speed, as well as better performance in navigating safely and time-efficiently. The ablation study reveals that the prioritized replay and expert demonstration filter play important roles in our proposed method.
翻译:目前,由于驾驶环境的复杂性,城市自主驾驶仍然具有挑战性。基于学习的方法,例如强化学习(RL)和模仿学习(IL),已经表明优于基于规则的方法,显示出明智决策的巨大潜力,但在城市驾驶情况下仍然不能很好地发挥作用。为了更好地解决这一问题,本文件提议了一种新的基于学习的方法,将深度强化学习与专家演示相结合,重点是自主驾驶中的纵向运动控制。我们提议的方法采用软性行为者-批评结构,并改变政策网络的学习过程,以纳入最大限度奖励和模仿专家的目标。此外,适应性优先的经验重现旨在从该代理人的自我探索和专家演示中抽样总结经验,以提高抽样效率。拟议的方法在模拟的城市环绕情景中得到验证,并与各种流行的RL和IL基线方法相比较。结果显示,拟议的方法使用更快的培训速度,以及安全和有时间效率的导航方法的更好表现。减缩研究显示,优先的重装和专家演示在拟议方法中扮演了重要角色。