A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
翻译:深平衡模型(DEQ) 隐含地通过输入输入输入输入的无限深度加权模型(DEQ) 的平衡点来定义。 它不使用无限计算,而是用根调查直接解决一个平衡点, 用隐含的差别计算梯度。 本研究调查了过度参数化的DEQ的培训动态。 通过假设初始平衡点的条件,我们表明,在培训过程中始终存在独特的平衡点,而梯度下降被证明会以二次损失函数的线性趋同速度接近全球最佳解决方案。 为了表明所需的初始条件通过轻度过分法得到满足,我们对随机的DEQ进行了精细分析。 我们提出了一个新颖的概率框架,以克服无限深度重数据模型非一次性分析的技术困难。