Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ gradient computations to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity.
翻译:一种下界和近似最优算法的双层经验风险最小化
Translated abstract:
双层优化问题,在许多机器学习应用中具有越来越多的应用。在许多实际情况下,上层和下层目标对应于经验风险最小化问题,因此具有总和结构。在这种情况下,我们提出了一个广受欢迎的SARAH算法的双层扩展。我们证明了该算法需要$\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$个渐进计算来实现$\varepsilon$-稳态,其中$n+m$为样本的总数,这比以前的双层算法更好。此外,我们提供了一个下界,该下界要求调用数,以获得双层问题的目标函数的近似稳定点。我们的算法达到了这个下界,因此从样本复杂度的角度来看是最优的。