In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus. Our result applies to most practical problems (i.e., definable problems) provided that a nonsmooth form of the classical invertibility condition is fulfilled. This approach allows for formal subdifferentiation: for instance, replacing derivatives by Clarke Jacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems. Moreover this calculus is entirely compatible with algorithmic differentiation (e.g., backpropagation). We provide several applications such as training deep equilibrium networks, training neural nets with conic optimization layers, or hyperparameter-tuning for nonsmooth Lasso-type models. To show the sharpness of our assumptions, we present numerical experiments showcasing the extremely pathological gradient dynamics one can encounter when applying implicit algorithmic differentiation without any hypothesis.
翻译:鉴于培训日益复杂的学习结构,我们建立了一种非移动的隐含功能理论,具有操作性微积分。我们的结果适用于大多数实际问题(即可定义的问题),条件是满足了传统可视性条件的非移动形式。这个方法允许正式的分化:例如,用通常的差别公式取代克拉克·雅各布斯的衍生物完全有理由适用于广泛的非移动问题。此外,这种微积分与算法的差别(如反向调整)完全兼容。我们提供了多种应用,例如培训深平衡网络、用锥形优化层培训神经网,或对非移动拉索型模型进行超光度调整。为了显示我们的假设的精确性,我们提出数字实验,说明在应用隐含的算法差异时可以遇到的极端病态的梯度动态,而没有任何假设。