Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
翻译:机器学习的许多问题涉及双级优化,包括超参数优化、元学习和数据集蒸馏。双级问题包括两个嵌套的子问题,分别称为外部和内部问题。在实践中,通常至少有一个子问题过于分解。在这种情况下,许多方法可以在实现同等客观价值的奥普里玛之间作出选择。根据最近对单级优化优化算法引起的隐含偏差的研究,我们调查了双级优化基于梯度的算法的隐含偏差。我们划定了两种标准的BLO方法 -- -- 冷启动和热启动 -- -- 并表明趋同的解决方案或长期行为在很大程度上取决于这些和其他的算法选择,例如高度偏差的近似。我们还表明,从热启动的BLO获得的内在解决方案可以对关于外部目标的大量信息进行编码,即使外部参数是低度的。我们认为,在单级内线优化的研究中,隐含的偏差应作为双级优化研究的核心作用。