Abstraction plays an important role in the generalisation of knowledge and skills and is key to sample efficient learning. In this work, we study joint temporal and state abstraction in reinforcement learning, where temporally-extended actions in the form of options induce temporal abstractions, while aggregation of similar states with respect to abstract options induces state abstractions. Many existing abstraction schemes ignore the interplay of state and temporal abstraction. Consequently, the considered option policies often cannot be directly transferred to new environments due to changes in the state space and transition dynamics. To address this issue, we propose a novel abstraction scheme building on successor features. This includes an algorithm for transferring abstract options across different environments and a state abstraction mechanism that allows us to perform efficient planning with the transferred options.
翻译:抽象化在普及知识和技能方面起着重要作用,并且是抽样有效学习的关键。在这项工作中,我们在强化学习中研究联合时间和状态抽象化。在强化学习中,我们研究的是时间和状态抽象化,在强化学习中,以选择形式进行的时间延伸行动导致时间抽象化,而在抽象选择方面将类似国家的汇总导致国家抽象化。许多现有的抽象化计划忽视了状态和时间抽象化的相互作用。因此,由于国家空间和过渡动态的变化,考虑的选项政策往往无法直接转移到新的环境中。为了解决这一问题,我们提出了一个以后续特征为基础的新的抽象计划。这包括一种将抽象选项转移到不同环境的算法和一个州抽象机制,使我们能够对转移的选项进行有效规划。