In the coming years and decades, autonomous vehicles (AVs) will become increasingly prevalent, offering new opportunities for safer and more convenient travel and potentially smarter traffic control methods exploiting automation and connectivity. Car following is a prime function in autonomous driving. Car following based on reinforcement learning has received attention in recent years with the goal of learning and achieving performance levels comparable to humans. However, most existing RL methods model car following as a unilateral problem, sensing only the vehicle ahead. Recent literature, however, Wang and Horn [16] has shown that bilateral car following that considers the vehicle ahead and the vehicle behind exhibits better system stability. In this paper we hypothesize that this bilateral car following can be learned using RL, while learning other goals such as efficiency maximisation, jerk minimization, and safety rewards leading to a learned model that outperforms human driving. We propose and introduce a Deep Reinforcement Learning (DRL) framework for car following control by integrating bilateral information into both state and reward function based on the bilateral control model (BCM) for car following control. Furthermore, we use a decentralized multi-agent reinforcement learning framework to generate the corresponding control action for each agent. Our simulation results demonstrate that our learned policy is better than the human driving policy in terms of (a) inter-vehicle headways, (b) average speed, (c) jerk, (d) Time to Collision (TTC) and (e) string stability.
翻译:在未来几年和几十年里,自治车辆将日益流行,为更安全和更方便的旅行提供新的机会,利用自动化和连通性,采取可能更明智的交通控制方法。汽车跟踪是自主驾驶的主要功能。基于强化学习的汽车近年来受到关注,目的是学习和达到与人相似的性能水平。然而,大多数现有的RL方法将汽车作为单方问题进行模拟,只对前面的车辆进行感应。最近的文献,Wang和Horn[16]表明,在考虑前面的车辆和后面的车辆后方车辆后方车辆后,双边汽车将具有更好的系统稳定性。在本文中,我们假设能够利用RL来学习这辆双边汽车,同时学习其他目标,如效率最大化、尽量减少自力和安全奖励,导致一个超越人驾驶的学习模式。我们提出并引入了深度强化学习学习学习(DRL)汽车框架,通过将双边信息纳入州和奖励功能来进行控制。此外,我们使用分散的多剂强化学习框架来为每个代理人创造相应的控制行动。我们的模拟政策(C)比平均速度(C)显示我们学习的进度要好(C),(C)进展政策比C)比速度(C)要好。