We are concerned with the question of how an agent can acquire its own representations from sensory data. We restrict our focus to learning representations for long-term planning, a class of problems that state-of-the-art learning methods are unable to solve. We propose a framework for autonomously learning state abstractions of an agent's environment, given a set of skills. Importantly, these abstractions are task-independent, and so can be reused to solve new tasks. We demonstrate how an agent can use an existing set of options to acquire representations from ego- and object-centric observations. These abstractions can immediately be reused by the same agent in new environments. We show how to combine these portable representations with problem-specific ones to generate a sound description of a specific task that can be used for abstract planning. Finally, we show how to autonomously construct a multi-level hierarchy consisting of increasingly abstract representations. Since these hierarchies are transferable, higher-order concepts can be reused in new tasks, relieving the agent from relearning them and improving sample efficiency. Our results demonstrate that our approach allows an agent to transfer previous knowledge to new tasks, improving sample efficiency as the number of tasks increases.
翻译:我们关心的是代理人如何从感官数据中获得自己的代表。我们把重点限制在为长期规划学习代表上,这是最先进的学习方法无法解决的一类问题。我们建议了一个框架,以便根据一套技能,自主学习一个代理人环境的抽象状态。重要的是,这些抽象状态是任务独立的,因此可以被再利用来解决新的任务。我们展示了代理人如何利用现有的一套选项,从自我和以物体为中心的观测中获得自己的代表。这些抽象状态可以立即由同一代理人在新的环境中重新利用。我们展示了如何将这些移动式代表与针对具体问题的方法结合起来,以产生对可用于抽象规划的具体任务的正确描述。最后,我们展示了如何自主地构建一个由日益抽象的表述组成的多层次等级。由于这些等级结构是可转让的,高层次的概念可以在新的任务中再利用,使代理人不再再学习它们,并提高抽样效率。我们的结果表明,我们的方法允许代理人将以前的知识转移到新的任务中,提高抽样效率作为任务的数目。