We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.
翻译:我们分析了一组具有高维随机最小平方问题动力的随机梯度算法。 我们的框架在随机矩阵理论的启发下,为这些算法产生的损失值序列提供了精确( 确定性) 的特征, 这些算法仅以赫森人的精度值表示。 这导致接近最佳的超参数的简单表达方式, 描述限制的邻里, 以及平均情况的复杂性。 因此, 我们显示( 小型批量) 随机重球动力与固定的动力参数相比, 在步数调整正确时, 并没有提供与 SGD 相比的实际性能改进。 相反, 在非坚固的convex 设置中, 使用动力可以大大超过 SGD 。 通过引入取决于样本数量的超度参数, 我们建议采用一个新的算法标准( 随机维度调整 Nesterov 加速度), 在不调整参数的情况下, 获得尽可能优化的平均情况复杂度, 而在强烈的 convex 设置中保持线性趋同 。