To quickly solve new tasks in complex environments, intelligent agents need to build up reusable knowledge. For example, a learned world model captures knowledge about the environment that applies to new tasks. Similarly, skills capture general behaviors that can apply to new tasks. In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent. Specifically, we leverage the idea of partial amortization for fast adaptation at test time. For this, actions are produced by a policy that is learned over time while the skills it conditions on are chosen using online planning. We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks and demonstrate improved sample efficiency in single tasks as well as in transfer from one task to another, as compared to competitive baselines. Videos are available at: https://sites.google.com/view/partial-amortization-hierarchy/home
翻译:为了迅速解决复杂环境中的新任务,智能代理机构需要建立可再使用的知识。例如,一个学习的世界模型可以捕捉适用于新任务的环境知识。同样,技能可以捕捉可适用于新任务的一般行为。在本文件中,我们调查如何将这两种方法纳入一个单一的强化学习代理机构。具体地说,我们利用部分摊还的想法来在测试时间快速适应。为此,采取行动的政策是长期学习的,同时利用在线规划来选择它所具备的技能。我们展示了我们在一套具有挑战性的移动任务中作出设计决定的好处,并展示了在单项任务中以及在从一项任务向另一项任务转移方面,与竞争性基线相比,样本效率的提高。视频可在以下网址查阅:https://sites.gogle.com/view/part-amortization-hierarchy/home。