The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.
翻译:将已知技能结合起来创造新技能的能力对于解决长期出现的复杂的强化学习问题可能至关重要。我们主张,一种强有力的综合技能方法是在假奖励(或“累积剂”)空间中界定和操纵技能。基于这一前提,我们提议了一个框架,利用各种选择的形式主义将技能结合起来。我们表明,每一个确定性选择都可明确代表为一个扩展域定义的累积性。根据这种洞察力和先前的转移学习结果,我们展示了如何将累积剂是已知选择物的线性组合的选项加以近似。这意味着,一旦我们学到了与一组累积剂有关的选项,我们就可以瞬间合成由任何线性组合所引发的选项,而无需任何学习。我们描述了这一框架如何为抽象行动与基本技能组合相对应的环境提供分级界面。我们展示了我们的方法在资源管理问题中的实际好处,以及涉及四重模拟机器人的导航任务。