非信号交界点连接车辆和自动化车辆协调协调:基于价值分解的多试剂深强化学习方法 (Coordination for Connected and Automated Vehicles at Non-signalized Intersections: A Value Decomposition-based Multiagent Deep Reinforcement Learning Approach)

Automator · Learning · Performer · 深度强化学习 · 强化学习 ·

2022 年 11 月 16 日

Coordination for Connected and Automated Vehicles at Non-signalized Intersections: A Value Decomposition-based Multiagent Deep Reinforcement Learning Approach

翻译：非信号交界点连接车辆和自动化车辆协调协调:基于价值分解的多试剂深强化学习方法

Zihan Guo,Yan Wu,Lifang Wang,Junzhi Zhang

The recent proliferation of the research on multi-agent deep reinforcement learning (MDRL) offers an encouraging way to coordinate multiple connected and automated vehicles (CAVs) to pass the intersection. In this paper, we apply a value decomposition-based MDRL approach (QMIX) to control various CAVs in mixed-autonomy traffic of different densities to efficiently and safely pass the non-signalized intersection with fairish fuel consumption. Implementation tricks including network-level improvements, Q value update by TD ($\lambda$), and reward clipping operation are added to the pure QMIX framework, which is expected to improve the convergence speed and the asymptotic performance of the original version. The efficacy of our approach is demonstrated by several evaluation metrics: average speed, the number of collisions, and average fuel consumption per episode. The experimental results show that our approach's convergence speed and asymptotic performance can exceed that of the original QMIX and the proximal policy optimization (PPO), a state-of-the-art reinforcement learning baseline applied to the non-signalized intersection. Moreover, CAVs under the lower traffic flow controlled by our method can improve their average speed without collisions and consume the least fuel. The training is additionally conducted under the doubled traffic density, where the learning reward converges. Consequently, the model with maximal reward and minimum crashes can still guarantee low fuel consumption, but slightly reduce the efficiency of vehicles and induce more collisions than the lower-traffic counterpart, implying the difficulty of generalizing RL policy to more advanced scenarios.

翻译：近期多试剂深层强化学习(MDRL)研究的激增为协调多联结和自动化车辆(CAVs)以通过十字路口提供了一个令人鼓舞的方式,在本文件中,我们采用了基于价值分解的MDRL方法(QMIX)来控制不同密度混合自动交通中的各种CAVs,以高效和安全地通过非信号化交叉点与公平燃料消耗的交叉点。执行技巧包括网络级改进、以TD(lambda$)更新Q值以及奖励剪贴作业被添加到纯QMIX框架中,预计这将提高交汇速度和原始版本的无信号性能性能。我们的方法的效力体现在几个评价指标上:平均速度、碰撞次数和每集成平均燃料消耗量中的平均速度,实验结果表明,我们的方法的趋同速度可以超过最初的QMIX和准度政策优化(PPPO),在非信号化车辆的递合速度中应用的状态强化基线,可以降低死亡率,在控制性交通中,以最低的节流下,在燃料循环中进行更多的学习。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。