Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ gradient computations to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity.
翻译:双层优化问题,即两个优化问题被嵌套的问题,在机器学习中的应用越来越多。在许多实际情况下,上下两个目标与实验风险最小化问题相对应,因此具有一个总的结构。在这方面,我们提议了著名的SASAH算法的双层扩展。我们证明,计算法需要$\mathcal{O}((n+m)\\frac12 ⁇ varepsilon}-1}((n+m)$)的梯度计算,以达到以n+m美元为基数的瓦列普西隆-静态,其样本总数比以往所有双层算法都有所改善。此外,我们对于获得双层问题客观功能的大致固定点所需的甲骨调数量提供了较低的约束。我们的算法达到了这一较低的界限,因此在抽样复杂性方面是最佳的。