We analyze architectural features of Deep Neural Networks (DNNs) using the so-called Neural Tangent Kernel (NTK), which describes the training and generalization of DNNs in the infinite-width setting. In this setting, we show that for fully-connected DNNs, as the depth grows, two regimes appear: "order", where the (scaled) NTK converges to a constant, and "chaos", where it converges to a Kronecker delta. Extreme order slows down training while extreme chaos hinders generalization. Using the scaled ReLU as a nonlinearity, we end up in the ordered regime. In contrast, Layer Normalization brings the network into the chaotic regime. We observe a similar effect for Batch Normalization (BN) applied after the last nonlinearity. We uncover the same order and chaos modes in Deep Deconvolutional Networks (DC-NNs). Our analysis explains the appearance of so-called checkerboard patterns and border artifacts. Moving the network into the chaotic regime prevents checkerboard patterns; we propose a graph-based parametrization which eliminates border artifacts; finally, we introduce a new layer-dependent learning rate to improve the convergence of DC-NNs. We illustrate our findings on DCGANs: the ordered regime leads to a collapse of the generator to a checkerboard mode, which can be avoided by tuning the nonlinearity to reach the chaotic regime. As a result, we are able to obtain good quality samples for DCGANs without BN.
翻译:我们用所谓的神经唐氏内核网络(NTK)来分析深神经网络(DNNS)的建筑特征。我们使用所谓的神经唐氏内核网络(NTK)来分析深神经网络(DNNS)的建筑特征,该结构描述了在无限宽度环境下对DNNS的培训和概括化。在这个环境中,我们显示对于完全连接的DNNS,随着深度的增长,有两个制度是“秩序 ” ( 缩放的) NTNK 相交于一个常数, 和“ chaos ” 。 我们的分析解释了所谓的检查板模式和边界工艺的外观, 从而阻碍了总体化。 将RELU作为非线性, 我们最终在定序的系统里, 层的正常化将网络引入混乱的系统。 我们观察到了一个类似的效果。 我们发现相同的秩序和混乱模式在深层革命网络(DC-NNNNN) 中, 能够将一个不以图表为基础的系统升级到新的游戏模式。我们建议一个基于不以图表的平衡的系统来改进我们的边界的系统。