Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.
翻译:简单的随机动量方法在机器学习优化中被广泛使用,但其良好的实际性能与文献中缺乏加速的理论保证相矛盾。在本工作中,我们旨在通过证明随机重球动量在二次优化问题上至少在使用足够大的批量进行小批量处理时,仍能保持(确定性)重球动量的快速线性收敛速率,从而缩小理论与实践之间的差距。我们所研究的算法可被解释为一种结合了小批量处理和重球动量的加速随机Kaczmarz算法。分析依赖于对动量转移矩阵的仔细分解,并使用了新的关于独立随机矩阵乘积的谱范数集中界。我们提供了数值示例,证明我们的界是较为紧致的。