Connected and automated vehicles (CAVs) have attracted more and more attention recently. The fast actuation time allows them having the potential to promote the efficiency and safety of the whole transportation system. Due to technical challenges, there will be a proportion of vehicles that can be equipped with automation while other vehicles are without automation. Instead of learning a reliable behavior for ego automated vehicle, we focus on how to improve the outcomes of the total transportation system by allowing each automated vehicle to learn cooperation with each other and regulate human-driven traffic flow. One of state of the art method is using reinforcement learning to learn intelligent decision making policy. However, direct reinforcement learning framework cannot improve the performance of the whole system. In this article, we demonstrate that considering the problem in multi-agent setting with shared policy can help achieve better system performance than non-shared policy in single-agent setting. Furthermore, we find that utilization of attention mechanism on interaction features can capture the interplay between each agent in order to boost cooperation. To the best of our knowledge, while previous automated driving studies mainly focus on enhancing individual's driving performance, this work serves as a starting point for research on system-level multi-agent cooperation performance using graph information sharing. We conduct extensive experiments in car-following and unsignalized intersection settings. The results demonstrate that CAVs controlled by our method can achieve the best performance against several state of the art baselines.
翻译:最近,连接和自动化车辆(CAVs)最近吸引了越来越多的关注。快速启动时间使得它们有可能提高整个运输系统的效率和安全性。由于技术挑战,一定比例的车辆将配备自动化设备,而其他车辆则没有自动化。我们不是学习自我自动化车辆的可靠行为,而是注重如何改进整个运输系统的结果,让每个自动车辆学习彼此合作并规范人驱动的交通流量。先进方法之一是利用强化学习学习学习智能决策政策。然而,直接强化学习框架无法改善整个系统的业绩。在本篇文章中,我们表明,在多试机构设置中,采用共享政策来考虑问题,可以比单一机构设置中不共享的政策更好地实现系统性能。此外,我们发现,利用互动功能的注意机制可以捕捉每个代理之间的相互作用,以促进合作。根据我们的知识,以往的自动化驾驶研究主要侧重于提高个人驾驶能力。这项工作是针对系统级多试剂合作进行非系统级研究的起点,而采用相同的政策,我们可以通过图表方式进行若干次测试,从而展示我们的业绩。