Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate initialization and with a sufficiently small step size. Finally, we validate our theoretical findings through experiments.
翻译:深度均衡模型(DEQs)作为一种训练无限深度权重共享神经网络的新范式,近年来在众多现代机器学习任务中实现了最先进的性能。尽管其在实际应用中取得了显著成功,但从理论上理解训练DEQs的梯度下降动力学仍是当前活跃的研究领域。本文在线性模型和单指标模型的简化设定下,对DEQs的梯度下降动力学进行了严格分析,填补了文献中的若干空白。我们证明了线性DEQs存在守恒律,该性质意味着训练过程中参数始终被限制在球面上,并利用此特性证明了梯度流在所有时间步均保持良态。随后,在适当的初始化和足够小的步长条件下,我们证明了梯度下降对线性DEQs及深度均衡单指标模型具有线性收敛性,且能收敛至全局最小点。最后,通过实验验证了理论结果的有效性。