Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian. Using implicit differentiation, we show it is possible to leverage the non-smoothness of the inner problem to speed up the computation. Finally, we provide a bound on the error made on the hypergradient when the inner optimization problem is solved approximately. Results on regression and classification problems reveal computational benefits for hyperparameter optimization, especially when multiple hyperparameters are required.
翻译:找到模型的最佳超强参数可以被描绘成双级优化问题, 通常使用零顺序技术解决。 在这项工作中, 当内部优化问题为二次曲线, 但非平滑时, 我们研究第一阶方法。 我们显示, 原始梯度下坡和近度协调下坡的远位模式差异 与精确的雅各布人相融合的亚cobian 下坡序列。 我们用隐含的区别, 显示有可能利用内部问题的非移动性来加速计算。 最后, 我们提供了在内部优化问题大致解决时在高度梯度上发生的错误的界限 。 回归和分类问题的结果揭示了超参数优化的计算效益, 特别是当需要多个超参数时 。