Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit differentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods proposed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that depends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.
翻译:双级优化(BO)有助于解决各种重要的机器学习问题,包括但不限于超参数优化、元学习、持续学习和强化学习。常规的BO方法需要通过低级优化过程进行分化,隐含差异,这需要与赫森矩阵相关的昂贵计算。最近曾为BO寻找一阶方法,但迄今为止提出的方法对于大型深层学习应用来说往往复杂不切实际。在这项工作中,我们提议了一个简单的一阶BO算法,该算法仅依赖于一阶梯度信息,不需要隐含差异,对于深层学习中的大型非电离子功能来说是实用和有效的。我们提供了对拟议用于非电离子目标固定点的方法的非简易统一分析,并提出了表明其优异实际表现的经验结果。