Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions. Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy. In this outside-in approach, the policy needs to comprehend the NL and manage the task simultaneously. However, the unbounded NL examples often bring much extra complexity for solving concrete RL tasks, which can distract policy learning from completing the task. To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique. The TL is used in RL to achieve highly efficient and effective policy training. Besides, a translator is trained to translate NL into TL. We implement this scheme as TALAR (TAsk Language with predicAte Representation) that learns multiple predicates to model object relationships as the TL. Experiments indicate that TALAR not only better comprehends NL instructions but also leads to a better instruction-following policy that improves 13.4% success rate and adapts to unseen expressions of NL instruction. The TL can also be an effective task abstraction, naturally compatible with hierarchical RL.
翻译:具有自然语言条件的自然强化学习(RL)使代理商能够遵循人文指令。 以往的做法通常通过以自然语言提供人文指导(NL)来实施有语言条件的RL, 并培训以下政策。 在这种外部做法中,政策需要理解NL, 并同时管理任务。 然而, 无限制的NL实例往往给解决具体的RL任务带来更多的复杂性, 这可能会分散政策学习完成任务后学习的注意力。 为了减轻政策的学习负担,我们调查自然语言条件的RL的内向计划,我们通过开发与任务相关且独特的任务语言(TL)来实施有语言条件的内向计划。 TL 用于实现高效有效的政策培训。 此外, 翻译还接受培训将NL 转化为TL 。 我们用TALAR (TASk 语言) 执行这个计划, 学习了像 TL. 实验表明, TALAR 不仅更好地理解了NL 指令, 而且还导致更好的指导政策, 改进了13.4%的成功率, 并适应了自然的NL 级指令。