Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
翻译:计算优化问题解决方案的 Jacobian 计算优化问题的方法是机器学习的一个中心问题。 机器学习中应用超光度优化、 元学习、 优化为一层和数据集蒸馏, 仅举几个例子。 无色化是一种流行的超常现象, 使用迭代解答器接近解决方案, 并通过计算路径将其区别开来。 这项工作对关于梯度下移和Chebyshev 方法的二次目标的这一方法进行了非非非无色的趋同率分析。 我们显示, 为确保雅各族的融合, 我们要么可以选择一个大学习率, 导致快速的零星趋同, 但接受算法可能会任意地长期燃烧, 或2 选择一个更小的学习率, 导致立即但更慢的趋同。 我们将此现象称为解动的诅咒。 最后, 我们讨论与这一方法有关的公开问题, 例如为最佳解动策略制定实用的更新规则, 以及与 Sobolev 或多式多式的字段建立新联系 。