In this work we analyze Multi-Agent Advantage Actor-Critic (MA2C) a recently proposed multi-agent reinforcement learning algorithm that can be applied to adaptive traffic signal control (ATSC) problems. To evaluate its potential we compare MA2C with Independent Advantage Actor-Critic (IA2C) and other Reinforcement Learning or heuristic based algorithms. Specifically, we analyze MA2C theoretically with the framework provided by non-Markov decision processes, which allows a deeper insight of the algorithm, and we critically examine the effectiveness and the robustness of the method by testing it in two traffic areas located in Bologna (Italy) simulated in SUMO, a software modeling tool for ATSC problems. Our results indicate that MA2C, trained with pseudo-random vehicle flows, is a promising technique able to outperform the alternative methods.
翻译:在这项工作中,我们分析了最近提出的、可适用于适应性交通信号控制(ATSC)问题的多剂强化学习算法(MA2C),为了评估其潜力,我们将MA2C与独立优势行为者-Critic(IA2C)和其他强化学习或基于脂质的算法进行了比较。具体地说,我们从理论上分析了MA2C,这是非马尔科夫决策程序提供的框架,从而可以更深入地了解算法,我们严格审查这种方法的有效性和可靠性,在位于博洛尼亚(意大利)的两处交通区测试这一方法,这是在SUMO模拟的,一种处理安非他明类兴奋剂问题的软件模型工具。我们的结果表明,经过假随机车辆流动培训的MA2C是一种有前途的技术,能够超越替代方法。