One of the biggest challenges in multi-agent reinforcement learning is coordination, a typical application scenario of this is traffic signal control. Recently, it has attracted a rising number of researchers and has become a hot research field with great practical significance. In this paper, we propose a novel method called MetaVRS~(Meta Variational RewardShaping) for traffic signal coordination control. By heuristically applying the intrinsic reward to the environmental reward, MetaVRS can wisely capture the agent-to-agent interplay. Besides, latent variables generated by VAE are brought into policy for automatically tradeoff between exploration and exploitation to optimize the policy. In addition, meta learning was used in decoder for faster adaptation and better approximation. Empirically, we demonstate that MetaVRS substantially outperforms existing methods and shows superior adaptability, which predictably has a far-reaching significance to the multi-agent traffic signal coordination control.
翻译:多试剂强化学习的最大挑战之一是协调,这是一个典型的应用情景,这就是交通信号控制。最近,它吸引了越来越多的研究人员,并成为了具有极大实际意义的热研究领域。在本文中,我们提出了一种名为MetaVRS~(Meta VRS-(Meta Vardal RewardShapping)的交通信号协调控制的新颖方法。通过对环境奖励运用内在的奖励,MetaVRS可以明智地捕捉代理人与代理人之间的相互作用。此外,VAE产生的潜在变数被引入政策,在勘探与开发之间自动取舍,以优化政策。此外,元学习被用于解码器中,以更快的适应和更好的近似。我们生动地指出,MetaVRS大大超越了现有方法,表现出了超强的适应性,可以预测这对多试剂通信协调控制具有深远意义。