In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach and the recent Jacobian-Free method in different settings, ranging from hyperparameter optimization to large Multiscale DEQs (MDEQs) applied to CIFAR and ImageNet. Both methods reduce significantly the computational cost of the backward pass. While SHINE has a clear advantage on hyperparameter optimization problems, both methods attain similar computational performances for larger scale problems such as MDEQs at the cost of a limited performance drop compared to the original models.
翻译:近些年来,隐含的深层学习逐渐成为一种提高深神经网络有效深度的方法。虽然它们的训练是记忆效率高的,但是它们的训练速度仍然大大慢于其直截了当的对应方。在深平衡模型(DEQs)中,培训是作为双级问题进行的,其计算复杂性部分是由于一个巨大的雅各布矩阵的迭代倒置所驱动的。在本文中,我们提出了一个解决这一计算性瓶颈的新战略,许多双级问题都由此而来。主要想法是使用从前传准牛顿矩阵到在梯度计算所需方向上有效接近雅各克矩阵的反面矩阵。我们在深度平衡模型(DEQs)中提供了一种激励我们使用原始前方算法的方法的理论。此外,我们通过修改这些前向算法,进一步提供理论保证我们的方法对真正的隐含的梯度梯度进行无序估计。我们从经验上研究这一方法以及最近在不同环境中的雅各自由方法,从超度优化到大型多尺度DEQ(MDEQs),在梯度计算过程中有效接近逆向雅各矩阵矩阵矩阵矩阵矩阵矩阵的逆向梯度矩阵计算方法,从而大大降低了CIFIRMINE的超时, 和图像计算成本。两种方法都大幅地计算了比重的超时,在超级计算。