We examine the use of different randomisation policies for stochastic gradient algorithms used in sampling, based on first-order (or overdamped) Langevin dynamics, the most popular of which is known as Stochastic Gradient Langevin Dynamics. Conventionally, this algorithm is combined with a specific stochastic gradient strategy, called Robbins-Monro. In this work, we study an alternative strategy, Random Reshuffling, and show convincingly that it leads to improved performance via: a) a proof of reduced bias in the Wasserstein metric for strongly convex, gradient Lipschitz potentials; b) an analytical demonstration of reduced bias for a Gaussian model problem; and c) an empirical demonstration of reduced bias in numerical experiments for some logistic regression problems. This is especially important since Random Reshuffling is typically more efficient due to memory access and cache reasons. Such acceleration for the Random Reshuffling policy is familiar from the optimisation literature on stochastic gradient descent.
翻译:本文研究了基于一阶(或过阻尼)朗之万动力学的随机梯度采样算法中不同随机化策略的使用,其中最广为人知的方法称为随机梯度朗之万动力学。传统上,该算法与一种特定的随机梯度策略(称为Robbins-Monro)结合使用。在本工作中,我们研究了一种替代策略——随机重排,并通过以下方面令人信服地证明其能提升性能:a) 在强凸且梯度满足Lipschitz条件的势函数下,证明了其在Wasserstein度量中偏差的降低;b) 针对高斯模型问题,通过解析方法展示了偏差的减小;c) 在逻辑回归问题的数值实验中,实证证明了偏差的降低。这一点尤为重要,因为随机重排策略通常因内存访问和缓存机制而更高效。随机重排策略的这种加速效应在随机梯度下降的优化文献中已为人熟知。