Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:对AdaGrad和其他适应性方法进行现有分析,以便进行顺利优化的AdaGrad和其他适应性方法的分析,通常是针对具有封闭域直径的功能。在未受限制的问题中,以往的工作保证了无症状的趋同率,没有对整个功能类保持明确的常态因素。此外,在随机分析环境中,只对AdaGrad的修改版进行了分析,与实践中通常使用的不使用最新梯度来更新阶梯化的版本不同。我们的文件旨在弥合这些差距,加深对AdaGrad及其变异的理解,使其更深入地了解AdaGrad在光滑锥形函数的标准设置以及类星函数的更一般设置中的变异性。首先,我们展示了新技术,将Vanilla AdaGrad的趋同率明确约束在确定性和随机环境的未受限制问题。第二,我们提出了AdaGradad的变式,我们可以展示上个梯度的趋同,而不是普通的变异性。最后,我们在确定性参数的设定中提供了新的加速适应性算法及其趋同性的保证,作为对先前的参数的明确依赖。