Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the solution of an equilibrium equation. Although a line of recent empirical studies has demonstrated its superior performances, the theoretical understanding of implicit neural networks is limited. In general, the equilibrium equation may not be well-posed during the training. As a result, there is no guarantee that a vanilla (stochastic) gradient descent (SGD) training nonlinear implicit neural networks can converge. This paper fills the gap by analyzing the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks. For an $m$-width implicit neural network with ReLU activation and $n$ training samples, we show that a randomly initialized gradient descent converges to a global minimum at a linear rate for the square loss function if the implicit neural network is \textit{over-parameterized}. It is worth noting that, unlike existing works on the convergence of (S)GD on finite-layer over-parameterized neural networks, our convergence results hold for implicit neural networks, where the number of layers is \textit{infinite}.
翻译:最近,由于将许多常用神经网络结构的循环预测规则加以概括,隐含的深层学习最近受到越来越多的关注,这是因为它概括了许多常用神经网络结构的循环预测规则。它的预测规则是以平衡方程式的解决方案为暗含的。虽然最近一系列实证研究显示其优异性,但对隐含神经网络的理论理解有限。一般来说,在培训期间,平衡方程式可能没有很好地定位。因此,无法保证一种香草(蒸气)梯度下沉(SGD)培训非线性隐含神经网络能够汇合。本文件通过分析校正线性线性单元(ReLU)的梯度流来填补空白。对于一个用RELU激活和1美元培训样本的百万维隐含神经网络来说,我们表明,如果隐性神经网络是纯度(SGD)定型网络的趋同性,那么我们定质层网络的趋同性水平是超定质级网络的,则随机初始性梯度下降值下降。