This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(\epsilon^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $\epsilon$-stationary solution. The $\epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $\epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.
翻译:本文建议一种新的算法 -- -- 下层子问题强固且上层目标功能平滑的双层问题。 与以前依赖\ emph{ 2- 时间尺度} 或\ emph{ 双圈} 技术的工作不同, 我们为上层和下层更新设计了一种振动助动梯度梯度梯度估测器。 后一种方法允许我们控制由于子问题溶解不准确而导致的振动性双层优化更新中的错误。 如果上层子问题功能平滑但可能非convex, 我们显示, 与以前依赖\ emph{ 2- 时间尺度} 或\ emph{ 双圈} 技术的工作不同, 我们为高级和下层更新设计了一个振动助助梯度梯度梯度梯度梯度梯度梯度梯度估测算器。 以美元比 美元标准值总正值越低。