Stochastic nested optimization, including stochastic compositional, min-max and bilevel optimization, is gaining popularity in many machine learning applications. While the three problems share the nested structure, existing works often treat them separately, and thus develop problem-specific algorithms and their analyses. Among various exciting developments, simple SGD-type updates (potentially on multiple variables) are still prevalent in solving this class of nested problems, but they are believed to have slower convergence rate compared to that of the non-nested problems. This paper unifies several SGD-type updates for stochastic nested problems into a single SGD approach that we term ALternating Stochastic gradient dEscenT (ALSET) method. By leveraging the hidden smoothness of the problem, this paper presents a tighter analysis of ALSET for stochastic nested problems. Under the new analysis, to achieve an $\epsilon$-stationary point of the nested problem, it requires ${\cal O}(\epsilon^{-2})$ samples. Under certain regularity conditions, applying our results to stochastic compositional, min-max and reinforcement learning problems either improves or matches the best-known sample complexity in the respective cases. Our results explain why simple SGD-type algorithms in stochastic nested problems all work very well in practice without the need for further modifications.
翻译:软巢式优化, 包括随机成份、 微调和双级优化, 在许多机器学习应用程序中越来越受欢迎。 虽然三个问题共同使用嵌套结构, 现有工程往往分别处理, 从而开发问题特有的算法和分析。 在各种令人兴奋的发展动态中, 简单的 SGD 型更新( 可能基于多种变量) 仍然在解决这种嵌套问题方面很普遍, 但据信它们比非嵌套问题的固定点的趋同率要慢。 本文将一些用于随机嵌套问题的SGD型更新合并成单一的 SGD 方法, 我们称之为“ 永久变换梯 dEScent( ALSET) ” ( ALSET) 方法。 本文通过利用隐藏的问题平滑的算法, 更严格地分析 ALSET 型更新( 可能基于多个变量) 。 在新分析中, 要达到 $\ psilon- stattical 点, 需要$_ ocal O} 样样本。 在一定的常规条件下, 将我们的结果应用到简单变型的样本变校正的样本中, 的变校正型 问题是如何解释。