Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, and rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.
翻译:表示学习和探索是任何深度强化学习模型面临的主要挑战。在本文中,我们提出一种基于奇异值分解的方法,可以用于获取保留域内底层转换结构的表示。有趣的是,我们展示这些表示还捕获了状态访问的相对频率,从而为探索提供了免费的伪计数估计。为了将这种分解方法扩展到大规模域,我们提供了一种算法,它不需要构建转移矩阵,可以使用深度网络,并允许小批量训练。此外,我们从预测状态表示中汲取灵感,并将我们的分解方法扩展到部分可观察环境。通过在部分可观察的多任务设置中进行实验,我们展示了所提出的方法不仅可以在 DM-Lab-30(其输入涉及语言指令、像素图像和奖励等)环境中学习有用的表示,而且在 DM-Hard-8 环境中完成了难度高的探索任务。