Communication in multi-agent reinforcement learning has been drawing attention recently for its significant role in cooperation. However, multi-agent systems may suffer from limitations on communication resources and thus need efficient communication techniques in real-world scenarios. According to the Shannon-Hartley theorem, messages to be transmitted reliably in worse channels require lower entropy. Therefore, we aim to reduce message entropy in multi-agent communication. A fundamental challenge is that the gradients of entropy are either 0 or infinity, disabling gradient-based methods. To handle it, we propose a pseudo gradient descent scheme, which reduces entropy by adjusting the distributions of messages wisely. We conduct experiments on two base communication frameworks with six environment settings and find that our scheme can reduce message entropy by up to 90% with nearly no loss of cooperation performance.
翻译:在多试剂强化学习中,通信最近因其在合作中的重要作用而引起人们的注意,然而,多试剂系统可能受到通信资源的限制,因此在现实世界情景中需要有效的通信技术。根据香农-哈特利理论,在更差的频道中可靠传递信息需要较低的英温。因此,我们的目标是减少多试剂通信中的电文导。一个基本挑战是,英特罗比梯度的梯度不是0,就是无限,就是梯度法。为了处理这个问题,我们提议了一个假梯度下降方案,通过明智地调整电文的分发减少增温。我们用两个基本通信框架进行试验,有六个环境环境环境,发现我们的方案可以将电文的英特率减少90%,而合作性几乎没有损失。