Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications. Our codes for MRBO, VRBO and other benchmarks are available $\text{online}^1$.
翻译:双级优化已被广泛应用于许多重要的机器学习应用程序, 如超参数优化和元学习。 最近, 提议了一些基于动力的算法, 以更快地解决双级优化问题。 但是, 这些基于动力的算法并没有比基于 SGD 的算法的$\ mathcal{O} (\\ epsilon\ ⁇ -2} (\ epsilon}) 实现比基于 SGD 算法的$更好的计算复杂性。 在本文中, 我们为双级优化提出了两种新的算法, 第一个算法采用基于动力的循环迭代法, 第二个算法在嵌套循环中采用循环梯度估计法, 以减少差异。 我们显示, 这两种算法都达到了 $\ mathcal{O} (\ epsilon ⁇ - 1. 15} $(\ ipslon) 。 我们的实验验证了我们的理论结果, 并展示了我们在超参数应用中的高级实验性工作表现。 我们的 RBO、 VBO 和其他基准是 $\ text{onn $_ $1$。