Adaptive traffic signal control (ATSC) in urban traffic networks poses a challenging task due to the complicated dynamics arising in traffic systems. In recent years, several approaches based on multi-agent deep reinforcement learning (MARL) have been studied experimentally. These approaches propose distributed techniques in which each signalized intersection is seen as an agent in a stochastic game whose purpose is to optimize the flow of vehicles in its vicinity. In this setting, the systems evolves towards an equilibrium among the agents that shows beneficial for the whole traffic network. A recently developed multi-agent variant of the well-established advantage actor-critic (A2C) algorithm, called MA2C (multi-agent A2C) exploits the promising idea of some communication among the agents. In this view,the agents share their strategies with other neighbor agents, thereby stabilizing the learning process even when the agents grow in number and variety. We experimented MA2C in two traffic networks located in Bologna (Italy) and found that its action translates into a significant decrease of the amount of pollutants released into the environment.
翻译:由于交通系统出现复杂的动态,城市交通网络的适应性交通信号控制(ATSC)是一项具有挑战性的任务。近年来,实验性地研究了基于多剂深度强化学习(MARL)的几种方法。这些方法提出了分配技术,其中每个信号性十字路口都被视为一种杂乱游戏的媒介,其目的在于优化其附近车辆的流动。在这一背景下,这些系统逐渐朝着有利于整个交通网络的媒介之间的平衡方向发展。最近开发的成熟的多剂性行为者-critic(A2C)算法,称为MA2C(多剂A2C),它利用了这些代理人之间某种有希望的交流想法。在这种观点中,代理人与其他邻居的代理人分享其战略,从而稳定了学习过程,即使这些代理人的数量和种类都有所增加。我们在位于博洛尼亚(意大利)的两个交通网络上试验了MA2C,发现它的行动导致释放到环境中的污染物数量显著下降。