Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the global convergences for first-order methods such as gradient descent if the implicit networks are over-parameterized. However, as they train all layers together, their analyses are equivalent to only studying the evolution of the output layer. It is unclear how the implicit layer contributes to the training. Thus, in this paper, we restrict ourselves to only training the implicit layer. We show that global convergence is guaranteed, even if only the implicit layer is trained. On the other hand, the theoretical understanding of when and how the training performance of an implicit neural network can be generalized to unseen data is still under-explored. Although this problem has been studied in standard feed-forward networks, the case of implicit neural networks is still intriguing since implicit networks theoretically have infinitely many layers. Therefore, this paper investigates the generalization error for implicit neural networks. Specifically, we study the generalization of an implicit network activated by the ReLU function over random initialization. We provide a generalization bound that is initialization sensitive. As a result, we show that gradient flow with proper random initialization can train a sufficient over-parameterized implicit network to achieve arbitrarily small generalization errors.
翻译:机器学习界的隐含神经网络越来越具有吸引力,因为它们可以达到竞争性的性能,但使用计算资源要少得多。最近,一行理论工程为初级方法建立了全球趋同,例如,如果隐含网络过于分光度过强,即梯度下降;然而,当它们将所有层次结合起来时,它们的分析就等于只是研究产出层的演变情况。因此,不清楚隐含层如何对培训作出贡献。因此,在本文件中,我们仅限于训练隐含层。我们表明,即使只训练隐含层,全球趋同也是有保障的。另一方面,对隐含神经网络的训练性能何时和如何普及到不可见的数据的理论理解仍然在探索中。虽然在标准的进化前向网络中研究了这个问题,但隐含神经网络的情况仍然令人费解,因为隐含层网络理论上具有无限的多层。因此,本文只调查隐含神经网络的普遍错误。具体地说,我们研究了由ReLU功能激活的隐含性网络的普遍化程度。我们研究的是,对于隐含性网络在随机初始化上何时和如何被普遍普及。我们提供了一种隐含的透明化的递化过程,我们提供了一种隐含的分级化的分级的分级的分级的分级化结果。我们提供了一种的分级的分级的分级的分级的分级的分级的分级,我们提供了一种分级的分级的分级的分级的分级的分级的分级。我们。我们提供了一种分级的分级的分级的分级的分级的分级的分级。