Humans commonly solve complex problems by decomposing them into easier subproblems and then combining the subproblem solutions. This type of compositional reasoning permits reuse of the subproblem solutions when tackling future tasks that share part of the underlying compositional structure. In a continual or lifelong reinforcement learning (RL) setting, this ability to decompose knowledge into reusable components would enable agents to quickly learn new RL tasks by leveraging accumulated compositional structures. We explore a particular form of composition based on neural modules and present a set of RL problems that intuitively admit compositional solutions. Empirically, we demonstrate that neural composition indeed captures the underlying structure of this space of problems. We further propose a compositional lifelong RL method that leverages accumulated neural components to accelerate the learning of future tasks while retaining performance on previous tasks via off-line RL over replayed experiences.
翻译:人类通常通过将知识分解为较容易的子问题,然后将分解为较容易的子问题,然后将次问题的解决办法结合起来,来解决复杂的问题。这种构成推理方法允许在处理与构成结构部分相关的未来任务时,重新使用分解问题。在持续或终身强化学习(RL)的环境下,这种将知识分解成可再使用的组成部分的能力将使代理商能够借助累积的构成结构,迅速学习新的RL任务。我们探索以神经模块为基础的一种特殊构成形式,并提出一套自然接受组成解决办法的RL问题。我们生动地表明,神经构成确实反映了这一问题空间的基本结构。我们进一步提出了一种构成性的终身RL方法,利用积累的神经组成部分来加快对未来任务的学习,同时保持通过离线的RL对重现的经验完成以前的任务。