Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as continual learning, neural architecture search, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. In this paper, we introduce a bilevel stochastic gradient method for bilevel problems with lower-level constraints. We also present a comprehensive convergence theory that covers all inexact calculations of the adjoint gradient (also called hypergradient) and addresses both the lower-level unconstrained and constrained cases. To promote the use of bilevel optimization in large-scale learning, we introduce a practical bilevel stochastic gradient method (BSG-1) that does not require second-order derivatives and, in the lower-level unconstrained case, dismisses any system solves and matrix-vector products.
翻译:在连续学习、神经结构搜索、对抗性学习和超参数调试等机器学习背景下,两级随机优化配方变得十分关键。在变量数量高或存在制约因素的优化或学习情景中,实用的双级随机优化问题在优化或学习中变得具有挑战性。在本文件中,我们为低级制约的双级问题引入双级随机梯度梯度方法。我们还提出了一个全面的趋同理论,涵盖对双级梯度(也称为超梯度)的所有不精确计算,并同时处理低级未受限制和受限制的情况。为了在大规模学习中推广双级优化,我们引入了实用的双级双级随机梯度梯度方法(BSG-1),该方法不需要二级衍生物,而在低级未受限制的情况下,则驳回任何系统解决方案和矩阵变量产品。