Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results.
翻译:双层优化是解决各种机器学习问题的有力工具。 两个当前受欢迎的双层优化者 AID-BiO 和 ITD-BiO 自然涉及解决一个或两个子问题,因此,我们通过循环(需要多次迭代)或无循环(只需要几次迭代)来解决这些问题,能够显著影响总体计算效率。文献中的现有研究仅涵盖其中一些执行选择,而现有的复杂界限不够完善,无法对不同的执行进行严格的比较。在本文件中,我们首先为AID-BiO和 ITD-BiO建立适用于所有执行选择的统一趋同分析。然后我们专门用我们的结果来描述所有执行的计算复杂性,从而能够对它们进行明确的比较。我们的结果表明,对于AID-BiO来说,估计内部功能的最佳点的循环有利于总体效率,尽管每次更新都带来更高的复杂性,而对于外部一级(ID-BiO)和ITD-Bi-BiO 都适用于所有执行选择的合并选择。 我们的上层循环、上层和下层的循环产品会降低我们必要的循环的难度。