When reinforcement learning is applied with sparse rewards, agents must spend a prohibitively long time exploring the unknown environment without any learning signal. Abstraction is one approach that provides the agent with an intrinsic reward for transitioning in a latent space. Prior work focuses on dense continuous latent spaces, or requires the user to manually provide the representation. Our approach is the first for automatically learning a discrete abstraction of the underlying environment. Moreover, our method works on arbitrary input spaces, using an end-to-end trainable regularized successor representation model. For transitions between abstract states, we train a set of temporally extended actions in the form of options, i.e., an action abstraction. Our proposed algorithm, Discrete State-Action Abstraction (DSAA), iteratively swaps between training these options and using them to efficiently explore more of the environment to improve the state abstraction. As a result, our model is not only useful for transfer learning but also in the online learning setting. We empirically show that our agent is able to explore the environment and solve provided tasks more efficiently than baseline reinforcement learning algorithms. Our code is publicly available at \url{https://github.com/amnonattali/dsaa}.
翻译:当强化学习应用到微薄的回报时,代理商必须花非常长的时间探索未知的环境而不留任何学习信号。 抽象是一种方法,它为代理商提供了潜质空间转型的内在奖励。 先前的工作侧重于密密连续的潜在空间, 或要求用户手工提供代表。 我们的方法是自动学习基础环境的离散抽象。 此外, 我们的方法是任意输入空间, 使用一个端到端可受训的正规化后续代表模式。 对于在抽象国家之间的过渡, 我们以选项的形式, 即行动抽象化的形式, 培训一系列暂时延长的行动。 我们提议的算法, 模糊国家行动摘要( DSAA), 对这些选项进行迭接互换, 并使用这些选项来有效探索更多的环境来改进状态抽象。 因此, 我们的模型不仅可用于传输学习, 而且在在线学习环境中使用。 我们的经验显示, 我们的代理商能够探索环境, 并且解决所提供的任务比基线强化学习算法更有效率。 我们的代码在\urli{givas/ givamcom.