Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima. On the other hand, Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and weak stochastic force, which in general slow down its convergence. This paper shows that these two algorithms can "collaborate" through a simple exchange mechanism, in which they swap their current positions if LD yields a lower objective function. This idea can be seen as the singular limit of the replica-exchange technique from the sampling literature. We show that this new algorithm converges to the global minimum linearly with high probability, assuming the objective function is strongly convex in a neighborhood of the unique global minimum. By replacing gradients with stochastic gradients, and adding a proper threshold to the exchange mechanism, our algorithm can also be used in online settings. We also study non-swapping variants of the algorithm, which achieve similar performance. We further verify our theoretical results through some numerical experiments, and observe superior performance of the proposed algorithm over running GD or LD alone.
翻译:已知的渐渐下降( GD) 很快会聚集在一起, 用于 convex 客观功能, 但可以困在本地迷你中 。 另一方面, Langevin 动态( LD) 可以探索国家空间, 找到全球迷你, 但是为了给出准确的估计, LD 需要使用一个小的离散步骤大小和微弱的随机力运行, 通常会减慢其趋同。 本文显示, 这两种算法可以通过简单的交换机制“ 合作”, 如果 LD 产生一个较低的客观函数, 就可以转换当前的位置 。 这个想法可以被看成是复制交换技术在抽样文献中的单一限制 。 我们显示, 这个新的算法可以以高概率线性方式连接到全球最小的最小值, 假设目标函数是在一个独特的全球最低值的附近 。 通过用随机梯度梯度取代梯度梯度, 加上一个适当的阈值, 我们的算法也可以在网络环境中使用。 我们还研究算法的不稀释变量, 从而实现相似的绩效 。 我们进一步通过某些数字实验来验证我们的理论结果或GLD 。