We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior. Consequently, capturing the relationship between key evolving features for a given task is conducive to recovering effective policies. To this end, hyperbolic geometry provides deep RL models with a natural basis to precisely encode this inherently hierarchical information. However, applying existing methodologies from the hyperbolic deep learning literature leads to fatal optimization instabilities due to the non-stationarity and variance characterizing RL gradient estimators. Hence, we design a new general method that counteracts such optimization challenges and enables stable end-to-end learning with deep hyperbolic representations. We empirically validate our framework by applying it to popular on-policy and off-policy RL algorithms on the Procgen and Atari 100K benchmarks, attaining near universal performance and generalization benefits. Given its natural fit, we hope future RL research will consider hyperbolic representations as a standard tool.
翻译:我们建议了一种新的深度强化学习算法(RL),该算法在双曲空间中建模潜在代表模式。 序列决策要求推理当前行为可能的未来后果。 因此, 捕捉特定任务关键变化特征之间的关系有助于恢复有效政策。 为此, 双曲几何制为深层RL模型提供了自然基础, 以精确地编码这一固有的等级信息。 但是, 应用双曲深深层次学习文献的现有方法, 会导致由于RL梯度估计器的非常态和差异性能导致致命的优化不稳定。 因此, 我们设计了一种新的通用方法, 来抵消这种优化挑战, 并且能够用深层的双曲表示法来稳定端到端的学习。 我们用经验来验证我们的框架, 将其应用于普罗根和阿塔里100K基准的流行的关于政策和离政策的RL算法, 并接近普及性能和普及性能。 我们希望未来的RL研究会将双曲表达法视为一种标准工具。