Solving temporally-extended tasks is a challenge for most reinforcement learning (RL) algorithms [arXiv:1906.07343]. We investigate the ability of an RL agent to learn to ask natural language questions as a tool to understand its environment and achieve greater generalisation performance in novel, temporally-extended environments. We do this by endowing this agent with the ability of asking "yes-no" questions to an all-knowing Oracle. This allows the agent to obtain guidance regarding the task at hand, while limiting the access to new information. To study the emergence of such natural language questions in the context of temporally-extended tasks we first train our agent in a Mini-Grid environment. We then transfer the trained agent to a different, harder environment. We observe a significant increase in generalisation performance compared to a baseline agent unable to ask questions. Through grounding its understanding of natural language in its environment, the agent can reason about the dynamics of its environment to the point that it can ask new, relevant questions when deployed in a novel environment.
翻译:解决时间延伸的任务是大多数强化学习算法[arXiv:1906.07343]面临的一项挑战。我们调查了RL代理商学习自然语言问题的能力,以了解其环境,在新颖的、时间延伸的环境中实现更大的概括性工作。我们这样做的方法是赋予该代理商向一个全知的甲骨文“是不”问题的能力,使该代理商获得关于手头任务的指导,同时限制获取新信息的机会。为了研究这种自然语言问题的出现,我们首先在微小的Grid环境中培训自己的代理商。我们然后将经过训练的代理商转移到一个不同、更困难的环境。我们观察到,与一个无法提问的基线代理商相比,总体性工作显著提高。通过在环境中对自然语言的理解,该代理商可以解释其环境的动态,从而在新环境中部署时可以提出新的、相关的问题。