We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general ``affine variance'' noise model and provides sharp rates of convergence in both the low-noise and high-noise~regimes.
翻译:我们用AdaGrad 步骤研究斯托卡梯底层:一种对一阶蒸汽优化的流行适应(自调)方法。尽管研究周密,但目前对这一方法的分析存在各种缺陷:它们要么假定对问题参数有一定了解,强加强大的全球利普施奇茨条件,要么没有给出高度概率的界限。 我们对这一基本方法进行了全面分析,而没有任何这些限制,无论是在卷轴还是非卷轴(mooth)情况下,这些限制进一步支持了一般的“硬化差异”噪音模型,并且提供了低噪音和高噪音的高度趋同率。