In recent years, a variety of gradient-based methods have been developed to solve Bi-Level Optimization (BLO) problems in machine learning and computer vision areas. However, the theoretical correctness and practical effectiveness of these existing approaches always rely on some restrictive conditions (e.g., Lower-Level Singleton, LLS), which could hardly be satisfied in real-world applications. Moreover, previous literature only proves theoretical results based on their specific iteration strategies, thus lack a general recipe to uniformly analyze the convergence behaviors of different gradient-based BLOs. In this work, we formulate BLOs from an optimistic bi-level viewpoint and establish a new gradient-based algorithmic framework, named Bi-level Descent Aggregation (BDA), to partially address the above issues. Specifically, BDA provides a modularized structure to hierarchically aggregate both the upper- and lower-level subproblems to generate our bi-level iterative dynamics. Theoretically, we establish a general convergence analysis template and derive a new proof recipe to investigate the essential theoretical properties of gradient-based BLO methods. Furthermore, this work systematically explores the convergence behavior of BDA in different optimization scenarios, i.e., considering various solution qualities (i.e., global/local/stationary solution) returned from solving approximation subproblems. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks.
翻译:近些年来,为解决机器学习和计算机视觉领域的双级优化(BLO)问题,开发了各种基于梯度的方法,以解决机器学习和计算机视觉领域的双级优化(BLO)问题,然而,这些现有方法的理论正确性和实际效力总是依赖于某些限制性条件(例如,低级单质、LLS),在现实应用中很难满足这些条件;此外,以前的文献仅证明了基于其具体的迭代战略的理论结果,因此缺乏统一分析不同梯度基底BLO的趋同行为的一般配方;在这项工作中,我们从乐观双级双级观点制定BLOB,并建立一个新的基于梯级的代行算框架,称为双级集成法(BDA),以部分解决上述问题。具体地说,BDA提供了一种模块化结构,将上级和低级子问题集中起来,形成我们的双级迭代动态动态。理论上,我们建立了一个总体的趋同分析模板,并得出新的证据配方方法的基本理论性质。此外,这项工作还系统地探索了BDA/MLOI的升级解决方案的趋同行为,并恢复了各种压式解决办法。