Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks, including image classification, language modeling, and machine translation. Finally, we provide convergence guarantees for SGD and Adam with the proposed adaptive momentum.
翻译:重球动力对于加速(随机)基于梯度的机械学习优化算法至关重要。 现有的重球动力通常被一个依赖过度调试的统一超参数加权。 此外, 校准的固定超参数可能不会导致最佳性能。 在本文中, 为消除调整与动力有关的超参数的努力, 我们提议了一种新的适应性动力, 其灵感来自为二次优化最佳选择的重球动力。 我们提议的适应性重球动力可以改善随机性梯度下降(SGD)和亚当。 具有新设计的适应性动力的SGD和Adam。 SGD和Adam与新设计的适应性动力相比更加强大, 聚集得更快, 并比基线更加普及。 我们核查SGD和Adam的效率, 与广泛的机器学习基准(包括图像分类、语言建模和机器翻译)的新适应性动力的适应性动力。 最后, 我们为SGD和Adam提供了与拟议适应性动力的趋同保证。