In this work we theoretically and experimentally analyze Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critic (IA2C), two recently proposed multi-agent reinforcement learning methods that can be applied to control traffic signals in urban areas. The two methods differ in their use of a reward calculated locally or globally and in the management of agents' communication. We analyze the methods theoretically with the framework provided by non-Markov decision processes, which provides useful insights in the analysis of the algorithms. Moreover, we analyze the efficacy and the robustness of the methods experimentally by testing them in two traffic areas in the Bologna (Italy) area, simulated by SUMO, a software tool. The experimental results indicate that MA2C achieves the best performance in the majority of cases, outperforms the alternative method considered, and displays sufficient stability during the learning process.
翻译:在这项工作中,我们从理论上和实验上分析了可用于控制城市地区交通信号的两种最近提出的多试剂强化学习方法,即可用于控制城市地区交通信号的两种多试剂强化学习方法,两种方法在使用当地或全球计算的奖励和管理代理通信方面有所不同,我们利用非马尔科夫决策程序提供的框架从理论上分析了方法,为分析算法提供了有益的见解。此外,我们通过在由软件工具SUMO模拟的博洛尼亚(意大利)地区两个交通领域试验这些方法的有效性和稳健性,我们分析了这些方法的效力和稳健性,实验结果表明,在多数情况下,MA2C取得了最佳业绩,超过了考虑的替代方法,在学习过程中表现出足够的稳定性。