Application of the replica exchange (i.e., parallel tempering) technique to Langevin Monte Carlo algorithms, especially stochastic gradient Langevin dynamics (SGLD), has scored great success in non-convex learning problems, but one potential limitation is the computational cost caused by running multiple chains. Upon observing that a large variance of the gradient estimator in SGLD essentially increases the temperature of the stationary distribution, we propose expediting tempering schemes for SGLD by directly estimating the bias caused by the stochastic gradient estimator. This simple idea enables us to simulate high-temperature chains at a negligible computational cost (compared to that of the low-temperature chain) while preserving the convergence to the target distribution. Our method is fundamentally different from the recently proposed m-reSGLD (multi-variance replica exchange SGLD) method in that the latter suffers from the low accuracy of the gradient estimator (e.g., the chain can fail to converge to the target) while our method benefits from it. Further, we derive a swapping rate that can be easily evaluated, providing another significant improvement over m-reSGLD. To theoretically demonstrate the advantage of our method, we develop convergence bounds in Wasserstein distances. Numerical examples for Gaussian mixture and inverse PDE models are also provided, which show that our method can converge quicker than the vanilla multi-variance replica exchange method.
翻译:对Langevin Monte Carlo算法应用复制交换(即平行调温)技术,特别是随机梯度梯度Langevin Langevin动态(SGLD)在非康纳学习问题中取得了巨大成功,但一个潜在的限制是运行多个链的计算成本。在发现SGLD的梯度估计器差异很大,基本上提高了定点分布的温度之后,我们建议直接估计振动梯度估计器造成的偏差,从而加快SGLD的调制办法。这一简单的想法使我们能够模拟高温链,而计算成本微不足道(与低温链相比),同时保持与目标分布的趋同。我们的方法与最近提议的 m-reSGLD(多变换汇SGLD)方法有根本的不同,后者由于梯度估计梯度估计器的精度低(例如,链度无法与目标相趋同)而加快调。此外,我们在方法上也得出一种汇率转换率,从而能够轻松地展示我们更趋近的方法。