We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $L$-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant $L$, target accuracy $\epsilon$ or any bound on gradient norms. In doing so, we are able to compute an $\epsilon$-stationary point with $\tilde{O}\left(n + \sqrt{n}/\epsilon^2\right)$ oracle-calls, which matches the respective lower bound up to logarithmic factors.
翻译:我们提议了一个适应性差异减少方法,称为AdaSpider, 以尽量减少以一定和总结构计算的L$- smooth、非convex函数。本质上, AdaSpider将一个受AdaGrad启发的[Duchi等人, 2011年, McMahan & Streeter, 2010年] 组合起来,但与[Fang等人, 2018年] 中提议的递归性随机路径集成测深器相比,一个相当不同的适应性步数表。据我们所知, Adaspider是第一个无参数的非convex差异减少方法,因为它不需要了解取决于问题的参数,例如平滑常值美元、目标精度 $\eplon 或任何受梯度规范约束的参数。 在这样做时,我们能够用 $tilde{O ⁇ left(n +\scrt{n}/\epsilon2\right) 或缩略调值(call-call-callcall) 来计算出一个相当低界限至逻辑系数的 。