Heavy ball momentum is a popular acceleration idea in stochastic optimization. There have been several attempts to understand its perceived benefits, but the complete picture is still unclear. Specifically, the error expression in the presence of noise has two separate terms: the bias and the variance, but most existing works only focus on bias and show that momentum accelerates its decay. Such analyses overlook the interplay between bias and variance and, therefore, miss important implications. In this work, we analyze a sample complexity bound of stochastic approximation algorithms with heavy-ball momentum that accounts for both bias and variance. We find that for the same step size, which is small enough, the iterates with momentum have improved sample complexity compared to the ones without. However, by using a different step-size sequence, the non-momentum version can nullify this benefit. Subsequently, we show that our sample complexity bounds are indeed tight for a small enough neighborhood around the solution and large enough noise variance. Our analysis also sheds some light on the finite-time behavior of these algorithms. This explains the perceived benefit in the initial phase of momentum-based schemes.
翻译:重球动力是一个流行的加速优化理念。 曾有几次尝试来理解其可见的效益, 但全貌仍不清楚。 具体地说, 噪音存在的错误表达有两个不同的术语: 偏差和差异, 但大多数现有作品只关注偏差, 并显示动力加速了其衰变。 这些分析忽略了偏差和差异之间的相互作用, 因此忽略了重要影响。 在这项工作中, 我们分析了一个样本复杂性, 包括重球动的样本, 以及包含偏差和差异的重球近似算法。 我们发现, 对于同样大小的步数, 与没有的步数相比, 动力的循环提高了样本复杂性。 但是, 使用不同的步数序列, 非运动版可以抵消这一效益。 随后, 我们显示, 我们的样本复杂性的界限对于解决方案周围足够小的邻居来说确实非常紧凑, 且噪音差异也很大。 我们的分析还揭示了这些算法的有限时间行为。 这解释了在基于动力的计划初始阶段所意识到的好处 。