Multi-agent Reinforcement Learning (MARL) based traffic signal control becomes a popular research topic in recent years. Most existing MARL approaches tend to learn the optimum control strategies in a decentralised manner by considering communication among neighbouring intersections. However, the non-stationary property in MARL may lead to extremely slow or even failure of convergence, especially when the number of intersections becomes large. One of the existing methods is to partition the whole network into several regions, each of which utilizes a centralized RL framework to speed up the convergence rate. However, there are two challenges for this strategy: the first one is how to get a flexible partition and the second one is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework where our region partitioning rule is based on the adjacency between the intersections and propose Dynamic Branching Dueling Q-Network (DBDQ) to search for optimal joint action efficiently and to maximize the regional reward. The experimental results with both real datasets and synthetic datasets demonstrate the superiority of our framework over other existing frameworks.
翻译:多智能体强化学习(MARL)基于交通信号控制已成为近年来的热门研究领域。现有的大多数MARL方法倾向于通过考虑相邻交叉口之间的通信以去中心化的方式学习最佳控制策略。然而,MARL中的非静态属性可能会导致极其缓慢或甚至失败的收敛,尤其是当交叉口数量变大时。现有方法之一是将整个网络划分成几个区域,每个区域利用集中式RL框架来加快收敛速度。然而,这种策略面临两个挑战:第一个是如何获得灵活的分区,第二个是如何搜索区域交叉口的最佳联合行动。在本文中,我们提出了一种新的训练框架,其中我们的区域划分规则是基于交叉口之间的邻接关系,并提出了动态分支决斗Q网络(DBDQ)来高效地搜索最优联合行动并最大化区域奖励。通过真实数据集和合成数据集的实验结果证明了我们的框架优于其他现有框架。