Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Secondly, we illustrate how to formulate a research problem as a bi-level optimization problem, which is of great practical use for beginners. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Last but not least, we conclude the survey by pointing out the great potential of gradient-based bi-level optimization on science problems (AI4Science).
翻译:双级优化,特别是基于梯度的优化,已在深层学习界广泛使用,包括超参数优化和元知识提取。双级优化将一个问题嵌入另一个问题,而基于梯度的分类则通过计算超梯度(这比传统方法,如演化算法效率高得多)来解决外部层面的任务。在本次调查中,我们首先对基于梯度的双级优化作出正式定义。第二,我们说明如何将研究问题发展成双级优化问题,这对初创者非常实用。更具体地说,有两种配方:一是优化超参数的单级配置,如正规化参数和蒸馏数据,二是提取元知识的多级配置,如模型初始化。然后用双级配方,我们讨论四个双级优化解决方案,以更新外部变量,包括明确的梯度更新、代理更新、隐含功能更新和封闭式更新。最后但并非最不重要的一点是,我们通过指出基于梯度的科学问题双级优化的巨大潜力来结束调查。