We propose a framework that learns to execute natural language instructions in an environment consisting of goal-reaching tasks that share components of their task descriptions. Our approach leverages the compositionality of both value functions and language, with the aim of reducing the sample complexity of learning novel tasks. First, we train a reinforcement learning agent to learn value functions that can be subsequently composed through a Boolean algebra to solve novel tasks. Second, we fine-tune a seq2seq model pretrained on web-scale corpora to map language to logical expressions that specify the required value function compositions. Evaluating our agent in the BabyAI domain, we observe a decrease of 86% in the number of training steps needed to learn a second task after mastering a single task. Results from ablation studies further indicate that it is the combination of compositional value functions and language representations that allows the agent to quickly generalize to new tasks.
翻译:我们建议一个框架,在由具有目标意义的任务构成的环境中执行自然语言指令,这些任务具有共同的任务内容。我们的方法利用价值函数和语言的构成性,目的是减少学习新任务的样本复杂性。首先,我们培训一个强化学习代理机构,学习随后可以通过布林代数组成的价值函数,以解决新任务。第二,我们微调一个在网络规模公司中预先培训的后继2seq模型,将语言映射为逻辑表达式,以具体说明所需的价值函数构成。我们评估了我们在BabyAI域的代理机构,我们观察到在完成一项单一任务后学习第二项任务所需的培训步骤减少了86%。减缩研究的结果进一步表明,正是组成价值函数和语言表述的组合,使得代理机构能够迅速概括新的任务。