Recent advances have witnessed that value decomposed-based multi-agent reinforcement learning methods make an efficient performance in coordination tasks. Most current methods assume that agents can make communication to assist decisions, which is impractical in some situations. In this paper, we propose a semi-communication method to enable agents can exchange information without communication. Specifically, we introduce a group concept to help agents learning a belief which is a type of consensus. With this consensus, adjacent agents tend to accomplish similar sub-tasks to achieve cooperation. We design a novel agent structure named Belief in Graph Clustering(BGC), composed of an agent characteristic module, a belief module, and a fusion module. To represent each agent characteristic, we use an MLP-based characteristic module to generate agent unique features. Inspired by the neighborhood cognitive consistency, we propose a group-based module to divide adjacent agents into a small group and minimize in-group agents' beliefs to accomplish similar sub-tasks. Finally, we use a hyper-network to merge these features and produce agent actions. To overcome the agent consistent problem brought by GAT, a split loss is introduced to distinguish different agents. Results reveal that the proposed method achieves a significant improvement in the SMAC benchmark. Because of the group concept, our approach maintains excellent performance with an increase in the number of agents.
翻译:最近的进展表明,价值分解的多试剂强化学习方法在协调任务中取得了高效的成绩。大多数现行方法假定代理人可以进行沟通以协助决策,在某些情况下是不切实际的。我们在本文件中提议半通信方法,使代理人能够不经交流而交换信息。具体地说,我们提出一个集团概念,帮助代理人学习一种具有某种共识的信念。有了这种共识,相邻代理人往往完成类似的子任务,从而实现合作。我们设计了一个名为“图集中的信仰”的新代理结构(BGC),由一个代理人特征模块、一个信仰模块和一个聚合模块组成。为了代表每个代理人的特点,我们使用一个基于MLP的特征模块来产生代理人的独特特征。在邻里认知一致性的激励下,我们提出一个基于集团的单元,将邻近的代理人分成一个小集团,并尽量减少集团内代理人的信念,以完成类似的子任务。最后,我们用一个超网络来合并这些特征并产生代理人的行动。为了克服GAT带来的代理人一贯的问题,我们引入了一种分解的损失来区分不同的代理人。结果表明,为了区分不同的代理人的特性,我们使用一个基于MAC的特性的模范的精准,从而保持了一种优秀的模范改进。