In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation -- good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25 thousand agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfer across games modes in Atari 2600.
翻译:在本文中,我们调查了通过深层强化学习系统所学到的表述的特性。关于强化学习的表述的早期工作大部分侧重于设计固定基建结构,以实现被认为可取的属性,例如正方形和宽度。相比之下,深层强化学习方法背后的想法是,代理设计师不应对代表属性进行编码,而是数据流应当确定代表属性 -- -- 在适当的培训计划下出现良好的表述。在本文件中,我们综合了这两种观点,对支持在强化学习中转移的表述属性进行了经验性调查。我们引入并测量了超过25 000个代理-塔斯克环境的六个代表属性。我们考虑在以像素为基础的导航环境中存在不同附带损失的深Q学习剂,其来源和转移任务与不同的目标地点相对应。我们开发了一种方法,以更好地了解某些表述为何通过系统方法更有利于转让,采用不同的任务相似性以及计量和将代表属性与转让性能相联系的方法。我们通过调查彩虹代理商所学会的表述方法的笼统性能,在阿塔里2600年成功地跨越各种游戏模式进行转让。