Reinforcement learning can train policies that effectively perform complex tasks. However for long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions. Hierarchies can further improve on this by abstracting the space states as well. We posit that a suitable state abstraction should depend on the capabilities of the available lower-level policies. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill. These value functions capture the affordances of the scene, thus forming a representation that compactly abstracts task relevant information and robustly ignores distractors. Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free and model-based methods.
翻译:强化学习可以培训有效履行复杂任务的政策。然而,对于长期横向任务来说,这些方法的执行随地平线而退化,往往需要推理和形成较低级别的技能。等级强化学习的目的是通过提供低层次技能库作为行动抽象模型来实现这一目标。等级学也可以通过抽取空间状态来进一步改进这一点。我们认为,适当的国家抽象化应该取决于现有较低级别政策的能力。我们提议了价值功能空间:一种简单的方法,利用与每种较低级别技能相对应的价值观功能来产生这种代表性。这些价值功能捕捉了现场的承受能力,从而形成了一种缩略地总结相关信息并强有力地忽略分散注意力的表示。关于磁共和机器人操纵任务的经验性评估表明,我们的方法可以提高长宽度性性性能,并且比其他无型和基于模型的方法更能零光化。