The study of emergent communication has been dedicated to interactive artificial intelligence. While existing work focuses on communication about single objects or complex image scenes, we argue that communicating relationships between multiple objects is important in more realistic tasks, but understudied. In this paper, we try to fill this gap and focus on emergent communication about positional relationships between two objects. We train agents in the referential game where observations contain two objects, and find that generalization is the major problem when the positional relationship is involved. The key factor affecting the generalization ability of the emergent language is the input variation between Speaker and Listener, which is realized by a random image generator in our work. Further, we find that the learned language can generalize well in a new multi-step MDP task where the positional relationship describes the goal, and performs better than raw-pixel images as well as pre-trained image features, verifying the strong generalization ability of discrete sequences. We also show that language transfer from the referential game performs better in the new task than learning language directly in this task, implying the potential benefits of pre-training in referential games. All in all, our experiments demonstrate the viability and merit of having agents learn to communicate positional relationships between multiple objects through emergent communication.
翻译:对新兴通信的研究一直致力于互动人工智能。虽然现有工作侧重于单一对象或复杂图像场景的通信,但我们认为,多个对象之间的通信关系对于更现实的任务很重要,但研究不足。在本文中,我们试图填补这一空白,侧重于两个对象之间定位关系的突发通信。我们在实验包含两个对象的特选游戏中培训代理人,发现一般化是定位关系涉及时的主要问题。影响新兴语言普及能力的关键因素是演讲人和听众之间的输入变异,这是我们工作中随机图像生成者所实现的。此外,我们发现,在新的多步MDP任务中,在定位关系描述目标时,学习的语言可以非常概括化,并且比原始像素图像和预先训练的图像特性更好,核查离散序列的强大通用能力。我们还显示,从精选游戏的语文转换在新任务中比直接学习这一任务的语言要好,这意味着在远程游戏前培训中可能带来的益处。所有实验中,都通过多步的代理方之间的交流,展示了我们多步方之间的交流关系的可行性和动态。