Communication helps agents to obtain information about others so that better coordinated behavior can be learned. Some existing work communicates predicted future trajectory with others, hoping to get clues about what others would do for better coordination. However, circular dependencies sometimes can occur when agents are treated synchronously so it is hard to coordinate decision-making. In this paper, we propose a novel communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, which is obtained by modeling the environment dynamics. In launching phase, the upper-level agents take the lead in making decisions and communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various multi-agent cooperative tasks.
翻译:通信帮助代理商获得关于他人的信息,以便学习更好的协调行为。有些现有工作与其他人交流了预测的未来轨迹,希望获得关于其他人会如何改进协调的线索。然而,当代理商得到同步处理从而难以协调决策时,有时会出现循环依赖性。在本文件中,我们提出了一个新的通信计划,即序列通信(SeqComm),SeqComm治疗代理商不时同步地(高层代理商在较低级别之前作出决定),并有两个沟通阶段。在谈判阶段,代理商通过传递隐藏的观察状态和比较意图的价值来确定决策的优先顺序,而意图的价值是通过模拟环境动态获得的。在启动阶段,高层代理商带头作出决定,并与较低级别的代理商交流行动。理论上,我们证明SeqCommerc公司所学的政策保证能够改进单一性和趋同性。在各种多代理合作任务中,我们证明SeqCommerc公司超越了现有方法。