Aggregating messages is a key component for the communication of multi-agent reinforcement learning (Comm-MARL). Recently, it has witnessed the prevalence of graph attention networks (GAT) in Comm-MARL, where agents can be represented as nodes and messages can be aggregated via the weighted passing. While successful, GAT can lead to homogeneity in the strategies of message aggregation, and the ``core'' agent may excessively influence other agents' behaviors, which can severely limit the multi-agent coordination. To address this challenge, we first study the adjacency tensor of the communication graph and demonstrate that the homogeneity of message aggregation could be measured by the normalized tensor rank. Since the rank optimization problem is known to be NP-hard, we define a new nuclear norm, which is a convex surrogate of normalized tensor rank, to replace the rank. Leveraging the norm, we further propose a plug-and-play regularizer on the adjacency tensor, named Normalized Tensor Nuclear Norm Regularization (NTNNR), to actively enrich the diversity of message aggregation during the training stage. We extensively evaluate GAT with the proposed regularizer in both cooperative and mixed cooperative-competitive scenarios. The results demonstrate that aggregating messages using NTNNR-enhanced GAT can improve the efficiency of the training and achieve higher asymptotic performance than existing message aggregation methods. When NTNNR is applied to existing graph-attention Comm-MARL methods, we also observe significant performance improvements on the StarCraft II micromanagement benchmarks.
翻译:聚合信息是多试剂强化学习(Comm-MARL)交流的一个关键组成部分。 最近,它见证了Comm-MARL中平面关注网络(GAT)的普及,在Comm-MARL中,代理商可以作为节点进行展示,信息可以通过加权传递进行汇总。虽然GAT成功,但GAT可以导致信息汇总战略的趋同性,而“核心”代理商可能会过度影响其他代理商的行为,从而严重限制多试剂的协调。为了应对这一挑战,我们首先研究通信图的相近性强度,并表明信息汇总的同质性可以用标准星标排名来测量。由于级别优化问题已知是NPP-rass,因此我们定义了一个新的核规范,这是正常的电压级,而“核心”的调适度标准,我们进一步提议对匹配性温度温度调适值标准进行调适值调高的调高标准(NTNTNERRRAR),因此,我们还可以在常规的GNRAT培训阶段,通过常规的GAR-AAT系统评估,从而展示现有的高额培训结果。