Communication is essential for coordination among humans and animals. Therefore, with the introduction of intelligent agents into the world, agent-to-agent and agent-to-human communication becomes necessary. In this paper, we first study learning in matrix-based signaling games to empirically show that decentralized methods can converge to a suboptimal policy. We then propose a modification to the messaging policy, in which the sender deterministically chooses the best message that helps the receiver to infer the sender's observation. Using this modification, we see, empirically, that the agents converge to the optimal policy in nearly all the runs. We then apply this method to a partially observable gridworld environment which requires cooperation between two agents and show that, with appropriate approximation methods, the proposed sender modification can enhance existing decentralized training methods for more complex domains as well.
翻译:人类和动物之间的交流是人类和动物之间协调的关键。 因此,随着智能剂的引进, 代理人与代理人和代理人与人类之间的交流变得十分必要。 在本文中, 我们首先研究在基于矩阵的信号游戏中学习, 以便从经验上表明分散的方法可以与亚最佳政策趋同。 然后我们建议修改信息政策, 发送者在其中选择有助于接收者推断发送者意见的最佳信息。 使用这一修改, 我们从经验上看到, 代理人几乎在所有运行过程中都趋向于最佳政策。 我们然后将这种方法应用到一个部分可见的网格世界环境中, 而这需要两个代理者之间的合作, 并表明, 如果采用适当的近似方法, 拟议的发送者修改可以加强更复杂的领域现有的分散培训方法 。