Allowing agents to share information through communication is crucial for solving complex tasks in multi-agent reinforcement learning. In this work, we consider the question of whether a given communication protocol can express an arbitrary policy. By observing that many existing protocols can be viewed as instances of graph neural networks (GNNs), we demonstrate the equivalence of joint action selection to node labelling. With standard GNN approaches provably limited in their expressive capacity, we draw from existing GNN literature and consider augmenting agent observations with: (1) unique agent IDs and (2) random noise. We provide a theoretical analysis as to how these approaches yield universally expressive communication, and also prove them capable of targeting arbitrary sets of actions for identical agents. Empirically, these augmentations are found to improve performance on tasks where expressive communication is required, whilst, in general, the optimal communication protocol is found to be task-dependent.
翻译:允许代理商通过通信分享信息对于解决多剂强化学习的复杂任务至关重要。 在这项工作中,我们考虑了特定通信协议能否表达任意政策的问题。我们注意到,许多现有协议可以被视为图形神经网络(GNNs)的例子,这表明联合行动选择与节点标签是等同的。由于标准的GNN方法的表达能力明显有限,我们从现有的GNN文献中提取信息并考虑以:(1) 独特的代理商身份和(2) 随机噪音来增加代理商的观测。我们对这些方法如何产生普遍直观的通信进行了理论分析,并证明这些方法能够针对相同代理商的任意行动组合。 偶然地发现,这些增强可以改善需要表达通信的任务的绩效,而一般而言,最佳通信协议则取决于任务。