Recent work in multi-agent reinforcement learning has investigated inter agent communication which is learned simultaneously with the action policy in order to improve the team reward. In this paper, we investigate independent Q-learning (IQL) without communication and differentiable inter-agent learning (DIAL) with learned communication on an adaptive traffic control system (ATCS). In real world ATCS, it is impossible to present the full state of the environment to every agent so in our simulation, the individual agents will only have a limited observation of the full state of the environment. The ATCS will be simulated using the Simulation of Urban MObility (SUMO) traffic simulator in which two connected intersections are simulated. Every intersection is controlled by an agent which has the ability to change the direction of the traffic flow. Our results show that a DIAL agent outperforms an independent Q-learner on both training time and on maximum achieved reward as it is able to share relevant information with the other agents.
翻译:多剂强化学习的近期工作调查了与行动政策同时学习的跨代理通信,以改善团队奖赏;在本文中,我们调查独立Q-学习(IQL),没有沟通,没有不同的跨代理学习(DIAL),没有适应性交通控制系统(ATCS)方面的知识交流。在现实世界中,ATCS不可能在模拟中向每个代理介绍整个环境状况,个体代理只能对环境的整体状况进行有限的观察。ATCS将使用模拟城市交通模拟器(SUMO)模拟,其中模拟两个连接的交叉点。每个交叉点都由能够改变交通流量方向的代理控制。我们的结果显示,在培训时间和取得的最大奖赏方面,个体代理都超越了独立的Q-learner,因为它能够与其他代理共享相关信息。