Communication between agents in collaborative multi-agent settings is in general implicit or a direct data stream. This paper considers text-based natural language as a novel form of communication between multiple agents trained with reinforcement learning. This could be considered first steps toward a truly autonomous communication without the need to define a limited set of instructions, and natural collaboration between humans and robots. Inspired by the game of Blind Leads, we propose an environment where one agent uses natural language instructions to guide another through a maze. We test the ability of reinforcement learning agents to effectively communicate through discrete word-level symbols and show that the agents are able to sufficiently communicate through natural language with a limited vocabulary. Although the communication is not always perfect English, the agents are still able to navigate the maze. We achieve a BLEU score of 0.85, which is an improvement of 0.61 over randomly generated sequences while maintaining a 100% maze completion rate. This is a 3.5 times the performance of the random baseline using our reference set.
翻译:合作性多试剂环境中的代理商之间的交流一般是隐含的或直接的数据流。 本文件将基于文本的自然语言视为受过强化学习培训的多个代理商之间的一种新型交流形式。 这可以被视为实现真正自主的交流的第一步,无需界定有限的一套指令,也可以视为人类和机器人之间的自然合作。在盲人铅游戏的启发下,我们建议一个代理商使用自然语言指令通过迷宫引导另一个代理商的环境。我们测试了强化学习代理商通过独立的单词级符号进行有效交流的能力,并表明这些代理商能够通过有限的词汇的自然语言进行充分交流。虽然这种交流并非始终完美英语,但代理商仍然能够浏览迷宫。我们取得了0.85的BLEU分数,比随机生成的序列提高了0.61分,同时保持100%的迷宫完成率。这是使用我们的参考集随机基线的3.5倍。