In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance. To ensure that this facilitator does not become a centralized controller, agents are incentivized to reduce their dependence on the messages it conveys, and the messages can only influence the selection of a policy from a fixed set, not instantaneous actions given the policy. We demonstrate the strength of this architecture over existing baselines on several cooperative MARL environments.
翻译:在多机构强化学习(MARL)中,往往引入专门渠道,使代理商能够相互直接沟通;在本文中,我们提出另一种办法,即代理商通过智能促进者进行沟通,该促进者学会筛选和解释所有代理商为改善代理商的集体业绩而发出的信号;为确保该促进者不成为中央控制者,激励他们减少对其所传递信息的依赖,信息只能影响从固定的一组行动中而不是根据该政策采取的即时行动中选择一项政策;在几个合作的MARCL环境中,我们展示了这一结构相对于现有基线的实力。