While the difficulty of reinforcement learning problems is typically related to the complexity of their state spaces, Abstraction proposes that solutions often lie in simpler underlying latent spaces. Prior works have focused on learning either a continuous or dense abstraction, or require a human to provide one. Information-dense representations capture features irrelevant for solving tasks, and continuous spaces can struggle to represent discrete objects. In this work we automatically learn a sparse discrete abstraction of the underlying environment. We do so using a simple end-to-end trainable model based on the successor representation and max-entropy regularization. We describe an algorithm to apply our model, named Discrete State-Action Abstraction (DSAA), which computes an action abstraction in the form of temporally extended actions, i.e., Options, to transition between discrete abstract states. Empirically, we demonstrate the effects of different exploration schemes on our resulting abstraction, and show that it is efficient for solving downstream tasks.
翻译:虽然强化学习困难通常与其国家空间的复杂性有关,但“抽象”组织建议,解决方案往往在于更简单的潜在潜在空间; 先前的工程侧重于学习连续或密集的抽象,或要求人类提供。 信息密集的表达方式捕捉了与解决问题无关的特征,而连续的空间则会为代表离散的物体而挣扎。 在这项工作中,我们自动学习对基础环境的分散分离抽象。 我们使用基于后续代表制和最大湿度正规化的简单端到端可训练模型这样做。 我们描述了一种应用我们模型的算法,即以时间延伸的行动(即选项)的形式计算行动抽象,以转换为分离的抽象状态。 我们生动地展示了不同勘探计划对我们由此产生的抽象化的影响,并表明它对于解决下游任务是有效的。