In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标达成的强化学习(RL)中,最佳价值函数具有特定的几何结构,称为准度量结构。本文介绍了准度量增强学习(QRL),这是一种利用准度量模型学习最佳价值函数的新型RL方法。与以前的方法不同,QRL的目标是专门为准度量而设计的,并提供了强大的理论回收保证。在离线和在线目标达成基准测试中,QRL还跨越了基于状态和基于图像的观察,展现了比其他方式更高的样本效率和性能,我们在离散的MountainCar环境中进行了全面的分析,确定了QRL的属性和优势。