Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to solve modern machine learning problems. However, most existing solution strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods that can efficiently handle BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into an approximated single-level problem based on the value-function, we provide a new method, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to partially address the above issues. To be specific, BVFSM constructs a series of value-function-based approximations, and thus successfully avoids the repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, which are time-consuming (especially for high-dimensional tasks). We also extend BVFSM to address BLO with additional upper- and lower-level functional constraints. More importantly, we demonstrate that the algorithmic framework of BVFSM can also be used for the challenging pessimistic BLO, which has never been properly solved by existing gradient-based methods. On the theoretical side, we strictly prove the convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is completely discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO problems (e.g., optimistic, pessimistic and with constraints) all with solid convergence guarantees. Extensive experiments verify our theoretical investigations and demonstrate the superiority of BVFSM on various real-world applications.
翻译:以梯度为基础的双级优化方法(BLO)被广泛用于解决现代机器学习问题,然而,大多数现有解决方案战略都是在理论上根据限制性假设(例如,低层次子问题的精度)设计的,在计算上不适用于高层次任务。此外,几乎没有基于梯度的方法能够在这些具有挑战性的情况中有效处理BLO,例如BLO具有功能限制和悲观的BLO。在这项工作中,将BLO改造成基于价值功能的近似单一层次问题,我们提供了一种新方法,名为双层次的基于价值的快速递减序列(BVFSM),以严格的方式解决上述问题。具体地说,BVFSM建立了一系列基于价值的近似,从而成功地避免了反复计算反复出现的梯度和基于赫斯的错误,这些方法是耗时的(特别是用于高层次的任务 ),我们还将BVFSM的方面面面面面面面面面,其基础是更精确的、更精确的BLOLO的稳定性调查也是我们目前最具有挑战性的。