Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in \citet{gatedattention} to train an agent to carry out these more complex language instructions.
翻译:强化培训学习机构执行自然语言指示受到现有监督的限制,即知道何时执行该指示。我们调整了CLEVR视觉回答问题数据集,以生成复杂的自然语言导航指示和附带的场景图,产生一个环境不可知的监控数据集。为了展示该数据集的使用情况,我们绘制了VizDoom环境的场景图,并使用在\citet{greatdatention}中的架构来培训一个执行这些更为复杂的语言指示的代理。