In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation and proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. In this paper, we show that this fact still holds for DEQs with any general activation which has bounded first and second derivatives. Since the new activation function is generally non-linear, a general population Gram matrix is designed, and a new form of dual activation with Hermite polynomial expansion is developed.
翻译:Ling等人在最近的一篇论文中调查了使用ReLU激活的过度平衡深平衡模型(DEQ),并证明梯度下降会以二次损耗函数线性趋同率达到全球最佳解决方案。在本文中,我们表明,这一事实仍然保留在DEQ上,任何普通的活化已经捆绑了第一和第二个衍生物。由于新的活化功能一般是非线性功能,因此设计了一个普通人口格阵列矩阵,并开发了一种与Hermite 聚氨酯扩张的双重激活新形式。