This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics (PSGLD) algorithm. This algorithm is designed to achieve adaptive inverse reinforcement learning (IRL). Adaptive IRL aims to estimate the cost function of a forward learner performing a stochastic gradient algorithm (e.g., policy gradient reinforcement learning) by observing their estimates in real-time. The PSGLD algorithm is considered passive because it incorporates noisy gradients provided by an external stochastic gradient algorithm (forward learner), of which it has no control. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing the forward learner's cost function nonparametrically from the stationary measure of a Langevin diffusion. This paper analyzes the non-asymptotic (finite-sample) performance; we provide explicit bounds on the 2-Wasserstein distance between PSGLD algorithm sample measure and the stationary measure encoding the cost function, and provide guarantees for a kernel density estimation scheme which reconstructs the cost function from empirical samples. Our analysis uses tools from the study of Markov diffusion operators. The derived bounds have both practical and theoretical significance. They provide finite-time guarantees for an adaptive IRL mechanism, and substantially generalize the analytical framework of a line of research in passive stochastic gradient algorithms.
翻译:暂无翻译