In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标达成强化学习(RL)中,最优值函数具有特殊的几何形态,被称为拟度量结构。本文引入了Quasimetric Reinforcement Learning(QRL),一种利用拟度量模型学习最优值函数的新RL方法。与先前的方法不同,QRL的目标是专门为拟度量设计的,并提供强大的理论恢复保证。实验上,我们在一个离散的MountainCar环境上进行了彻底的分析,确定了QRL的属性及其相比其他方法的优点。在离线和在线目标达成基准测试中,QRL也表现出优异的样本效率和性能,在状态和图像观察方面均如此。