Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of scientific and technical applications by explicitly encoding group symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by the explicit symmetry bias of their convolutional architecture, a recent line of work has proposed that the implicit bias of training algorithms on a particular parameterization (or architecture) is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent in a binary classification task converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative symmetry groups (such as permutations). We validate our theorems via experiments on a variety of groups and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.
翻译:G-CNN的成败取决于其进化结构的明显对称偏差,但最近的一行工作表明,在特定参数化(或结构)上,培训算法的隐含偏差是理解超分度神经网一般化的关键。在这方面,我们表明,在二进制分类任务中,通过梯度下降而培训的L$-层全维线G-CNN的全线G-CNN在二进制分类任务中,通过低端的Fourier矩阵系数(按2/L$-Schatten 矩阵规范进行规范调整),将G-CNN的成功引向解决方案。我们的工作严格概括了以前对线性CNN对某一特定参数化(或结构)的隐含偏差的分析,这是理解超分度神经网一般化一般化的关键。在这方面,我们显示,通过梯度梯度梯度下降而培训的L$-level-with 线G-CNNs的全线性G-CNNs在它们的建筑结构中,通过二进制性分类任务,通过低端矩阵计算结果,我们最后验证了我们不甚深层的空定式空间模型的模型,我们用不精确的模型进行我们不精确的实验性分析。