Imposing orthogonal transformations between layers of a neural network has been considered for several years now. This facilitates their learning, by limiting the explosion/vanishing of the gradient; decorrelates the features; improves the robustness. In this framework, this paper studies theoretical properties of orthogonal convolutional layers. More precisely, we establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. These conditions show that orthogonal convolutional transforms exist for almost all architectures used in practice. Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed. We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and remains accurate when the size of the signals/images is large. This holds for both row and column orthogonality. Finally, we confirm these theoretical results with experiments, and also empirically study the landscape of the regularization term.
翻译:多年来一直考虑在神经网络各层之间进行正向变形,这有利于他们学习,限制梯度的爆炸/衰落;调整特征;改进强度;在此框架内,本文件研究正向相交层的理论特性;更准确地说,我们在层结构上建立必要和充分的条件,保证存在正向共转变。这些条件表明几乎所有实际使用的架构都存在正向变形。最近,提出了一个规范化术语,规定卷变层的正向性;我们把这个正规化术语和正向性计量方法联系起来。在这样做时,我们表明这一规范化战略在数字和优化错误方面是稳定的,在信号/图像大小大时,保持准确性。这既适用于行,也适用于柱形或直线。最后,我们通过实验证实了这些理论结果,同时也以经验方式研究正规化术语的面貌。