Stochastic difference-of-convex (DC) optimization is prevalent in numerous machine learning applications, yet its convergence properties under small batch sizes remain poorly understood. Existing methods typically require large batches or strong noise assumptions, which limit their practical use. In this work, we show that momentum enables convergence under standard smoothness and bounded variance assumptions (of the concave part) for any batch size. We prove that without momentum, convergence may fail regardless of stepsize, highlighting its necessity. Our momentum-based algorithm achieves provable convergence and demonstrates strong empirical performance.
翻译:随机差分凸优化在众多机器学习应用中普遍存在,但其在小批量规模下的收敛性质仍缺乏深入理解。现有方法通常需要大批量或强噪声假设,这限制了其实际应用。本研究表明,在标准光滑性及(凹部分)有界方差假设下,动量机制能够在任意批量规模下实现收敛。我们证明,若不使用动量,无论步长如何选择,收敛都可能失败,这凸显了动量的必要性。我们提出的基于动量的算法具有可证明的收敛性,并展现出优异的实证性能。