We provide a function space characterization of the inductive bias resulting from minimizing the $\ell_2$ norm of the weights in multi-channel convolutional neural networks with linear activations and empirically test our resulting hypothesis on ReLU networks trained using gradient descent. We define an induced regularizer in the function space as the minimum $\ell_2$ norm of weights of a network required to realize a function. For two layer linear convolutional networks with $C$ output channels and kernel size $K$, we show the following: (a) If the inputs to the network are single channeled, the induced regularizer for any $K$ is independent of the number of output channels $C$. Furthermore, we derive the regularizer is a norm given by a semidefinite program (SDP). (b) In contrast, for multi-channel inputs, multiple output channels can be necessary to merely realize all matrix-valued linear functions and thus the inductive bias does depend on $C$. However, for sufficiently large $C$, the induced regularizer is again given by an SDP that is independent of $C$. In particular, the induced regularizer for $K=1$ and $K=D$ (input dimension) is given in closed form as the nuclear norm and the $\ell_{2,1}$ group-sparse norm, respectively, of the Fourier coefficients of the linear predictor. We investigate the broader applicability of our theoretical results to implicit regularization from gradient descent on linear and ReLU networks through experiments on MNIST and CIFAR-10 datasets.
翻译:我们提供一种功能空间,描述由于将多通道神经神经网络重量的正负值降低到$/ell_2美元标准而导致的感动偏差,通过线性启动和实证测试我们对使用梯度下降训练的RELU网络得出的假设。我们将功能空间中诱导的调节器定义为实现函数所需的网络重量的最小值$/ell_2美元标准。对于具有美元输出渠道和内核值大小为K的两层线性循环网络,我们展示了以下内容:(a) 如果对网络的投入是单一渠道,任何K$的诱导调节器独立于输出渠道的数量 $C美元。此外,我们生成的正正序是半确定程序(SDP)给出的规范。 (b) 相比之下,对于多渠道投入来说,对于仅仅实现所有基值线性线性功能,因此感动偏差取决于$C美元。然而,对于足够大的美元,则由SDP(SDP)再次将正常值的正值调整器与美元正值值值值值值值(RO1)的正值部分,对于以美元正常值1美元为固定值的汇率,而以美元为固定的直值为内的汇率,而以美元为内基值为基值为基值的正常值为基值为基值,对基值为基值为基值为基值的正常值的正常值为基值,对基值,对基值的正常值,对基值,对基值的正值为基值为基值为基值,对基值,对基值,对基值,对基值,对基值的正值为基值为基值的正值的正值为基值的正值为基值为基值为基值为基值为基值为基值为基值为基值为基值为基值的正值为基值的正常值为基值,对基值,对基值,对基值,对基值,对基值,对基值为基值为基值为基值为基值为基值为基值的正值为基值,对基值,对基值,对基值,对基为基值为基值为基值为基值为基值为基值为基值为基值的定期值为