Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning. Recently, several momentum-based algorithms have been proposed to solve bilevel optimization problems faster. However, those momentum-based algorithms do not achieve provably better computational complexity than $\mathcal{\widetilde O}(\epsilon^{-2})$ of the SGD-based algorithm. In this paper, we propose two new algorithms for bilevel optimization, where the first algorithm adopts momentum-based recursive iterations, and the second algorithm adopts recursive gradient estimations in nested loops to decrease the variance. We show that both algorithms achieve the complexity of $\mathcal{\widetilde O}(\epsilon^{-1.5})$, which outperforms all existing algorithms by the order of magnitude. Our experiments validate our theoretical results and demonstrate the superior empirical performance of our algorithms in hyperparameter applications.
翻译:双级优化已被广泛应用于许多重要的机器学习应用程序, 如超参数优化和元学习。 最近, 提议了一些基于动力的算法, 以更快地解决双级优化问题。 但是, 这些基于动力的算法并没有比基于 SGD 的算法的 $\ mathcal ~ 全局 O} (\\ epsilon ⁇ -2} ) 实现更好的计算复杂性。 在本文中, 我们为双级优化建议了两种新的算法, 第一个算法采用了基于动力的循环迭代, 而第二个算法则采用了嵌入循环的递增梯度估计, 以减少差异。 我们显示, 这两种算法都达到了 $\ mathcal ~ 全局 O} (\ epsilon ⁇ - 1.5} ) 的复杂度, 这比所有基于 SGDD 的算法都高。 我们的实验证实了我们的理论结果, 并展示了我们在超参数应用中的算法的高级经验性表现 。