Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because they are constructed for state-only aggregation and derived from a policy-dependent transition kernel, without considering the issue of exploration. To address these issues, we propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy, while also balancing the exploration-versus-exploitation trade-off during learning. A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings. In addition, an experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
翻译:代表制学习往往通过管理维度的诅咒,在强化学习方面发挥关键作用。 一组具有代表性的算法利用随机过渡动态的光谱分解来构建在理想化环境中具有很强理论特性的演示,但目前的光谱方法的适用性有限,因为它们是为国家专用的聚合而建立的,并且取自政策独立的过渡内核,而没有考虑探索问题。为了解决这些问题,我们提议了一种替代光谱方法,即光谱分解代表(SPEDER),从动态中提取一种国家行动抽象,不引起对数据收集政策的虚假依赖,同时平衡学习期间勘探-versus-开发的取舍。一项理论分析确定了在线和离线环境中的拟议算法的抽样效率。此外,一项实验性调查还表明,在几个基准中,相对于当前最先进的算法,其表现优于当前最先进的算法。