Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:关于AdaGrad在$\mathbb{R}^{d}$上的收敛性问题:超越凸性,非渐进速率和加速
翻译后的摘要:
AdaGrad和其他适应性方法的现有分析通常适用于具有有界定义域直径的函数。在无约束问题中,以前的工作保证了无明确常数因子的渐近收敛速率,适用于整个函数类。此外,在随机设置中,仅分析了AdaGrad的修改版本(不同于通常实际使用的版本),其中最新的梯度未用于更新步长。我们的论文旨在填补这些差距,并对标准设置中的平滑凸函数以及更一般的夸脉凸函数进行更深入的理解。首先,我们展示了新技术,以显式界定在确定性和随机设置中的无约束问题的vanilla AdaGrad的收敛速率。其次,我们提出了一种AdaGrad的变体,对其可以展示最后迭代的收敛性,而不是平均迭代。最后,在具有显式依赖于问题参数的确定性设置中提供了新的加速适应性算法及其收敛保证,这些算法的渐近速率优于以前的工作。