Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:现有的对于 smooth convex optimization 中的 AdaGrad 和其他自适应方法的分析通常是针对有界定义域直径的函数。在无约束问题中,以前的研究保证了一个渐近收敛速率,但没有一个适用于整个函数类的显式常数因子。此外,在随机设置中,只有一个改良版本的 AdaGrad 已经被研究过,与通常使用的版本不同,该版本不使用最新的梯度来更新步长。本文旨在弥合这些差距,并深入理解 AdaGrad 及其变种的标准设置,包括 smooth convex functions 和更一般的 quasar convex functions。首先,我们展示了新的技术,以显式地限制了 unconstrained problems 中 vanilla AdaGrad 在确定性和随机设置中的收敛速率。其次,我们提出了 AdaGrad 的一个变体,我们可以展示最后一次迭代的收敛,而不是平均迭代。最后,在确定性设置中给出了新的加速自适应算法及其收敛保证,其显式依赖于问题参数,这比以前的工作中的渐近速率有所改善。