This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.
翻译:本文分析双级优化的双时间级随机算法框架 { 双级优化 。 双级优化是一个显示双级结构的一类问题, 目标是将外部目标函数最小化, 其变量限制为优化( 内向) 问题的最佳解决方案 。 我们考虑当内部问题不受控制且强烈的 convex 时, 外向问题受到制约, 且具有平稳的客观功能 。 我们建议用双时间级随机近流算法( TTSA) 来解决双级优化问题 。 在算法中, 内部问题使用具有更大步数的随机梯度更新, 而外部问题则使用带有较小步数的预测随机梯度更新。 我们在不同环境下分析 TTSA 算法的趋同率: 当外部问题强烈的 convex (resp.~ weakly convol) 时, TTTTSA- tralalal- transalalal- logue a calalal- resue a. kQal- a assal- a a assional a a resual ex ex ex a. k_ a a a a. k_ assalalalal a a a a a a a a a a a a a a a a a a a a a a a a a sual ex ex ex ex ex ex ex ex ex ex ex ex a a a a a a a a a ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex) ex a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a ex ex ex ex ex