Recently, there has been a great deal of research in emergent communication on artificial agents interacting in simulated environments. Recent studies have revealed that, in general, emergent languages do not follow the compositionality patterns of natural language. To deal with this, existing works have proposed a limited channel capacity as an important constraint for learning highly compositional languages. In this paper, we show that this is not a sufficient condition and propose an intrinsic reward framework for improving compositionality in emergent communication. We use a reinforcement learning setting with two agents -- a \textit{task-aware} Speaker and a \textit{state-aware} Listener that are required to communicate to perform a set of tasks. Through our experiments on three different referential game setups, including a novel environment gComm, we show intrinsic rewards improve compositionality scores by $\approx \mathbf{1.5-2}$ times that of existing frameworks that use limited channel capacity.
翻译:最近,对模拟环境中人造物剂互动的突发通信进行了大量研究,最近的研究显示,一般而言,新出现语言并不遵循自然语言的构成模式。要解决这个问题,现有作品提出将有限的频道能力作为学习高度构成语言的重要制约因素。在本文中,我们表明,这不是一个充分的条件,并提议了一个内在的奖励框架,以改善新出现通信的构成性。我们使用两个代理的强化学习设置,即:一个\textit{task-aware}发言人和一个需要交流以完成一系列任务的Textit{state-aware}听众。通过我们关于三种不同的特制游戏设置的实验,包括一个新的环境Comm,我们展示了内在的回报,通过美元\ approx\mathbf{1.5-2}的倍于使用有限频道能力的现有框架的倍。