单一时间尺度的存储器双级最佳优化方法 (A Single-Timescale Stochastic Bilevel Optimization Method)

Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.

翻译：软性双级优化将经典的随机优化从最小化一个目标到最小化一个取决于另一个优化问题的解决方案的客观功能。最近, 随机性双级优化在超参数优化和模型- 不可知性元学习等新兴机器学习应用程序中重新获得支持。为解决这一类随机优化问题, 现有方法需要双圈或双级更新, 有时效率较低。本文为一类单时级随机双级问题开发了一种新的优化方法, 我们称之为单时级双层优化( SSTAW) 方法。以单一循环方式运行, 并使用固定批量大小的单时间级更新。要达到双级问题固定点, 现有方法需要双级的双层更新或双级更新( epsilon) 或双级更新。本文开发了一种新的优化方法, 并且要在高调调的单级双级双级双级双级双级双级双级双级双级(\\\ 2} ) 样本中实现美元- 最佳的双级双级解决方案。 STable- 将最佳精度( silsimto) rodegraphilstal) 一级( tal) 一级, 一级( tal- sqol) 一级, 一级, 一级, 一级( 一级) 一级) 一级优化为最佳的一级( 一级一级) 一级一级一级) 一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级级升级至一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级一级级