Gradient descent (GD) is known to converge quickly for convex objective functions, but it can be trapped at local minima. On the other hand, Langevin dynamics (LD) can explore the state space and find global minima, but in order to give accurate estimates, LD needs to run with a small discretization step size and weak stochastic force, which in general slow down its convergence. This paper shows that these two algorithms and their non-swapping variants. can ``collaborate" through a simple exchange mechanism, in which they swap their current positions if LD yields a lower objective function. This idea can be seen as the singular limit of the replica-exchange technique from the sampling literature. We show that this new algorithm converges to the global minimum linearly with high probability, assuming the objective function is strongly convex in a neighborhood of the unique global minimum. By replacing gradients with stochastic gradients, and adding a proper threshold to the exchange mechanism, our algorithm can also be used in online settings. We also study non-swapping variants of the algorithm, which achieve similar performance. We further verify our theoretical results through some numerical experiments and observe superior performance of the proposed algorithm over running GD or LD alone.
翻译:已知的渐渐下降( GD) 很快会集合为 convex 客观功能, 但是它会被困在本地迷你中。 另一方面, Langevin 动态( LD) 可以探索国家空间并找到全球迷你, 但是为了提供准确的估计, LD 需要使用一个小的离散步骤大小和微弱的随机力来运行, 通常会放慢其趋同速度。 本文显示, 这两种算法及其非吸附变量可以通过一个简单的交换机制“ collaborape” 。 如果 LD 产生一个较低的目标函数, 也可以在这种机制中互换其当前位置。 这个想法可以被看作是抽样文献中复制交换技术的单一限制。 我们显示, 这个新的算法非常有可能与全球最低的线性趋同, 假设这个目标功能在与独特的全球最低值的相邻处非常相近。 通过以吸附梯度梯度取代梯度的梯度, 并为交换机制添加一个适当的阈值, 我们的算法也可以在网上环境中使用。 我们还研究一些不抽换的变方, 算法的变法,, 仅能实现相同的GLD 。