In this paper, we investigate the problem of stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth but possibly non-convex functions. Existing methods for solving this problem either suffer from sub-optimal sample complexities or need a huge batch size. To address this limitation, we propose a Stochastic Multi-level Variance Reduction method (SMVR), which achieves the optimal sample complexity of $\mathcal{O}\left(1 / \epsilon^{3}\right)$ to find an $\epsilon$-stationary point for non-convex objectives. Furthermore, when the objective function satisfies the convexity or Polyak-Lojasiewicz (PL) condition, we propose a stage-wise variant of SMVR and improve the sample complexity to $\mathcal{O}\left(1 / \epsilon^{2}\right)$ for convex functions or $\mathcal{O}\left(1 /(\mu\epsilon)\right)$ for non-convex functions satisfying the $\mu$-PL condition. The latter result implies the same complexity for $\mu$-strongly convex functions. To make use of adaptive learning rates, we also develop Adaptive SMVR, which achieves the same optimal complexities but converges faster in practice. All our complexities match the lower bounds not only in terms of $\epsilon$ but also in terms of $\mu$ (for PL or strongly convex functions), without using a large batch size in each iteration.
翻译:在本文中, 我们调查了多层次拼写优化问题, 目标函数是多个光滑但可能不是混凝土功能的构成。 解决这一问题的现有方法要么存在亚最佳样本复杂性, 要么需要巨大的批量大小。 为解决这一限制, 我们建议采用Stochacti多层次差异减少方法( SMVR), 该方法可以实现 $mathcal{ O ⁇ left( 1 /\ epsilon% 3 ⁇ right) 的最佳样本复杂性, 以便找到一个用于非convex 目标的 $- splain 值固定点。 此外, 当目标函数满足了混凝土或Polyak- Lojasiewicz( PLPL) 的复杂度时, 我们提出SMVR的阶段变异, 并将样本复杂性提高到$mathcal{Orft( 1 / \ \ eepslonlon) 或 $\ mustimal lax comnal ( 1/ mill) lax lax mox transettilal ex ex) ex ex ex exal legillation legillation ex ex legillation ex ex ex ex ex ex ex ex ex ex ex legillational ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex 。