In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标达成强化学习中,最优值函数具有一种特殊的几何结构,称为拟度量结构。本文引入了一种新的强化学习方法,称为拟度量强化学习(QRL),它利用拟度量模型来学习最优值函数。与以往的方法不同,QRL目标专门设计用于拟度量,提供了强大的理论恢复保证。在一个离散化的MountainCar环境上进行了彻底的分析,确定了QRL的性质及其优于其他替代方案的优点。在离线和在线的目标达成基准测试中,QRL还展示了更好的样本效率和绩效,在状态和基于图像的观察方面均有良好表现。