Solving a bilevel optimization problem is at the core of several machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost in a bilevel problem requires the inverse of the Hessian of the lower-level cost. In this work, we propose a novel algorithm for solving bilevel optimization problems based on the classical penalty function approach. Our method avoids computing the Hessian inverse and can handle constrained bilevel problems easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our method's simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large-scale setting. Our results show that our approach outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time, and convergence speed.
翻译:解决双层优化问题是一些机器学习问题的核心,如超参数调试、数据脱色、数据脱色、全成和少成片的学习以及培训数据中毒。与同时或多目标优化不同,在双层问题中,最大限度地降低较高水平成本的下降方向最陡峭,这要求与低层费用赫西安人相反。在这项工作中,我们提出了一个基于传统惩罚功能方法解决双层优化问题的新算法。我们的方法避免计算赫西安反向,可以很容易地处理受限制的双层问题。我们证明这种方法在温和条件下的趋同性,并表明精确的高度梯度是偶然的。我们的方法简单、小的空间和时间复杂性使我们能够有效地解决涉及深层神经网络的大规模双层问题。我们介绍了在大规模设置中数据脱色、少发的学习和培训数据中毒问题的结果。我们的结果显示,我们的方法已经超越或可与先前提出的基于自动区别和精确度、运行时间和速度的近似反向的方法相比。