Model-based offline reinforcement learning approaches generally rely on bounds of model error. Estimating these bounds is usually achieved through uncertainty estimation methods. In this work, we combine parametric and nonparametric methods for uncertainty estimation through a novel latent space based metric. In particular, we build upon recent advances in Riemannian geometry of generative models to construct a pullback metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out-of-distribution samples as well as the discrepancy of examples in the data. We leverage our method for uncertainty estimation in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline approaches on continuous control and autonomous driving benchmarks.
翻译:基于模型的离线强化学习方法通常依赖模型误差的范围。估计这些误差通常是通过不确定性估计方法实现的。在这项工作中,我们通过一个新的潜在空间指标,将不确定性估计的参数和非参数方法结合起来。特别是,我们利用里曼尼的基因模型几何学最近的进展,以构建基于编码器脱coder的远期模型的拉回指标。我们提议的衡量尺度既衡量分配范围之外样品的质量,也衡量数据中实例的差异。我们利用我们的方法,在一个悲观的模型基础框架中进行不确定性估计,显示在持续控制和自主驾驶基准方面,当代基于模型的离线方法有了重大改进。