Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.
翻译:神经网络层的成份或变形已知, 有助于通过限制梯度的爆炸/衰落来提高学习的准确性; 装饰特性; 提高稳健性; 本文研究了正交层的理论属性。 我们在层结构上建立必要和充分的条件, 保证存在正交层变形。 条件证明, 用于“ 螺旋” 的几乎所有实际结构都存在正交层变换。 我们还表现出了“ valid” 边界条件和“ same” 边界条件与零整形的精确性之间的局限性。 值得注意的是, 提出了将正交层的正交错性化术语, 在不同的应用中取得了令人印象深刻的经验性结果( Wang 等人 2020 )。 本文的第二个动机是说明此结构背后的理论。 我们将这一正规化术语与正交层测量测量的测量性措施联系起来。 我们这样做表明, 这一正规化战略在数字和最优化错误方面是稳定的, 并且, 在接近的正交错性( ) 直交错性, 和正交界的货币结构变变变变变变变的模型中, 将显示, 和变变变变变变的模型的数值值是用来显示。 。