Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic solvers employed by the procedure. We provide numerical experiments to support our theoretical analysis and to show the advantage of using stochastic hypergradients in practice.
翻译:双层优化问题在机器学习中日益受到重视,因为它们为超参数优化和元化学习提供了一个自然框架。解决这些问题的一个关键步骤是高效计算高层次目标(高度)的梯度。在这项工作中,我们研究高梯度的随机近似方案,当较低层次的问题是大型数据集的实验风险最小化时,这种方案很重要。我们建议的方法是近似隐含差异法的随机变体(Pedregosa,2016年)。我们提供了超梯度近似的平均方形错误的界限,前提是只有通过预期收缩的随机测图才能接近较低层次的问题。特别是,我们的主要界限对选择程序使用的两种随机求解器具有不可知性。我们提供数字实验来支持我们的理论分析,并展示在实践中使用超梯度高梯度的优势。