Estimating hyperparameters has been a long-standing problem in machine learning. We consider the case where the task at hand is modeled as the solution to an optimization problem. Here the exact gradient with respect to the hyperparameters cannot be feasibly computed and approximate strategies are required. We introduce a unified framework for computing hypergradients that generalizes existing methods based on the implicit function theorem and automatic differentiation/backpropagation, showing that these two seemingly disparate approaches are actually tightly connected. Our framework is extremely flexible, allowing its subproblems to be solved with any suitable method, to any degree of accuracy. We derive a priori and computable a posteriori error bounds for all our methods, and numerically show that our a posteriori bounds are usually more accurate. Our numerical results also show that, surprisingly, for efficient bilevel optimization, the choice of hypergradient algorithm is at least as important as the choice of lower-level solver.
翻译:估计超参数是机器学习中长期存在的一个问题。 我们认为,当手的任务被建模为优化问题的解决方案时, 会出现这样的情形。 这里, 无法对超参数精确的梯度进行实际计算, 需要制定大致的战略。 我们为计算超梯度引入一个统一的框架, 将基于隐含函数的超梯度法和自动区分/回向分析的现有方法概括化, 表明这两种看起来截然不同的方法实际上紧密相连。 我们的框架非常灵活, 允许以任何适当的方法, 以任何准确度解决其子问题。 我们得出一个我们所有方法的先验和可比较的后验错误, 并用数字显示我们的后验界限通常更准确。 我们的数字结果还显示, 令人惊讶的是, 高效的双位优化, 选择高梯度的算法至少与选择较低级别的解算法同样重要 。