Options in reinforcement learning allow agents to hierarchically decompose a task into subtasks, having the potential to speed up learning and planning. However, autonomously learning effective sets of options is still a major challenge in the field. In this paper we focus on the recently introduced idea of using representation learning methods to guide the option discovery process. Specifically, we look at eigenoptions, options obtained from representations that encode diffusive information flow in the environment. We extend the existing algorithms for eigenoption discovery to settings with stochastic transitions and in which handcrafted features are not available. We propose an algorithm that discovers eigenoptions while learning non-linear state representations from raw pixels. It exploits recent successes in the deep reinforcement learning literature and the equivalence between proto-value functions and the successor representation. We use traditional tabular domains to provide intuition about our approach and Atari 2600 games to demonstrate its potential.
翻译:强化学习的选项允许代理商将任务分解为分任务,从而有可能加快学习和规划。 但是,自主学习有效的选项组合仍然是该领域的一大挑战。 在本文中,我们侧重于最近引入的关于使用代表学习方法来指导选项发现过程的设想。 具体地说, 我们查看igenoption, 从将环境信息流混杂化的演示中获取的选项。 我们将现有的igenoption发现算法扩展至具有随机过渡和手工艺功能无法提供的设置。 我们建议一种算法,在从原始像素中学习非线性状态表达的同时发现eigen 选项。 它利用了在深度强化学习文献方面的近期成功以及原值功能和后续代表之间的等值。 我们使用传统的表格域来提供我们方法的直觉和Atari 2600游戏来展示其潜力。