Adaptive traffic signal control with Multi-agent Reinforcement Learning(MARL) is a very popular topic nowadays. In most existing novel methods, one agent controls single intersections and these methods focus on the cooperation between intersections. However, the non-stationary property of MARL still limits the performance of the above methods as the size of traffic networks grows. One compromised strategy is to assign one agent with a region of intersections to reduce the number of agents. There are two challenges in this strategy, one is how to partition a traffic network into small regions and the other is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework RegionLight where our region partition rule is based on the adjacency between the intersection and extended Branching Dueling Q-Network(BDQ) to Dynamic Branching Dueling Q-Network(DBDQ) to bound the growth of the size of joint action space and alleviate the bias introduced by imaginary intersections outside of the boundary of the traffic network. Our experiments on both real datasets and synthetic datasets demonstrate that our framework performs best among other novel frameworks and that our region partition rule is robust.
翻译:随着交通信号控制中的自适应性越来越受到关注,多智能体强化学习(MARL)成为当前研究热点。在现有的研究中,大多数新方法都是一个代理控制单个交叉口,这些方法侧重于交叉口之间的协作。然而,由于MARL的非稳态特性,上述方法的性能仍然受到限制,随着交通网络规模的增长,该问题变得尤为突出。一种折中的策略是,把一个区域的交叉口分配给一个代理人,以减少代理数量。这种策略面临两个挑战,一个是如何将交通网络划分为小区域,另一个是如何搜索区域交叉口的最佳联合动作。本文提出了一种新的培训框架RegionLight,其中区域分区规则基于交叉口之间的邻接关系,并将扩展的分支Dueling Q网络(BDQ)应用到动态分支Dueling Q网络(DBDQ)中,以限制联合动作空间的增长并减轻引入虚拟交叉口的偏差。我们在真实数据集和人工合成数据集上的实验表明,我们的框架在其他新框架中表现最佳,而且我们的区域分区规则是稳健的。