A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
翻译:深度均衡模型(DEQ)通过无穷深度的权重绑定模型和输入注入的平衡点来隐式地定义。 DEQ不进行无穷计算,而是直接通过根找到平衡点,并通过隐式微分计算梯度。本研究调查了超参数DEQ的训练动态。通过对初始平衡点的假设条件,我们表明在训练过程中始终存在唯一的平衡点,并且对于二次损失函数,梯度下降可被证明以线性收敛速度收敛到全局最优解。为了展示通过轻微超参数确定所需的初始条件是满足的,我们对随机DEQ进行了细致的分析。我们提出了一个新的概率框架来克服在无穷深度的权重绑定模型的非渐近分析中的技术困难。