Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies theoretical properties of orthogonal convolutional layers. We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary condition and 'same' boundary condition with zero padding. Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments, the landscape of the regularization term is studied and the regularization strategy is validated on real datasets. Altogether, the study guarantees that the regularization with L_{orth} (Wang et al. 2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.
翻译:神经网络层的成形或变形已知, 有助于通过限制梯度的爆炸/衰落来促进学习; 装饰特性; 提高稳健性; 本文研究了正交相相层的理论属性。 我们为层结构设置了必要和充分的条件, 保证存在正交错的变形。 条件证明, 在“ 螺旋” 弯曲的操作中, 几乎所有结构都存在正交替变形。 我们还表现出了“ valid” 边界条件和“ same” 边界条件的局限性, 以及零倾斜。 最近, 提出了一个规范化术语, 规定了共交层的正交错性, 在不同的应用中取得了令人印象深刻的经验性结果( Wang 等人 2020 ) 。 本论文的第二个动机是具体说明此结构背后的理论。 我们把这一正规化术语和正交错相联起来。 这样, 我们显示这一正规化战略在数字和优化学习错误方面是稳定的, 并且, 在存在小错误的情况下, 并且当正近的轨变整战略的规模, 信号/ 变正值是真实的变正变正值 。 。 和正交比值的变序的策略在研究中, 。 。