Stochastic gradient Langevin dynamics (SGLD) are a useful methodology for sampling from probability distributions. This paper provides a finite sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve inverse reinforcement learning. By "passive", we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner). The PSGLD algorithm thus acts as a randomized sampler which recovers the cost function being optimized by this external process. Previous work has analyzed the asymptotic performance of this passive algorithm using stochastic approximation techniques; in this work we analyze the non-asymptotic performance. Specifically, we provide finite-time bounds on the 2-Wasserstein distance between the passive algorithm and its stationary measure, from which the reconstructed cost function is obtained.
翻译:通过被动随机梯度阻尼法的自适应逆强化学习的有限样本界限
翻译后的摘要:
随机梯度 Langevin 动力学 (SGLD) 是对概率分布进行采样的有用方法。本文提供了一个被动的随机梯度 Langevin 动力学算法(PSGLD)的有限样本分析,该算法旨在实现逆强化学习。这里的“被动”是指 PSGLD 算法(逆向学习过程)可用的噪声梯度由一个外部的随机梯度算法(正向学习者)在随机选择的点进行评估。因此,PSGLD 算法充当了一个随机化采样器,可以恢复正向学习者正在优化的成本函数。之前的研究使用随机逼近技术分析了该被动算法的渐近性能;在这项工作中,我们分析了非渐近性能。具体而言,我们提供了被动算法及其稳态测量之间的2-Wasserstein距离的有限时间界限,从中可以获得重构的成本函数。