Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between parameters' norm and obtained estimators theoretically remains misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the minimal parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. As a comparison, this $\sqrt{1+x^2}$ weighting disappears when the norm of the bias terms are ignored. This additional weighting is of crucial importance, since it is shown in this work to enforce uniqueness and sparsity (in number of kinks) of the minimal norm interpolator. On the other hand, omitting the bias' norm allows for non-sparse solutions. Penalising the bias terms in the regularisation, either explicitly or implicitly, thus leads to sparse estimators. This sparsity might take part in the good generalisation of neural networks that is empirically observed.
翻译:在培训神经网络时, 控制参数的规范通常能产生良好的概括化。 除了简单的直觉外, 参数的规范与获得的测算符之间的关系在理论上仍然被误解。 对于一个带有单维数据的隐藏 ReLU 层网络来说, 这项工作显示了代表函数所需的最小参数的规范是其第二个衍生物的总变异所给出的, 加权值为$sqrt{1+x%2} 系数。 相比之下, 当偏差条件的规范被忽略时, 这个 $\sqrt{ 1+x%2} 加权值就会消失。 这种额外加权值至关重要, 因为在这项工作中显示要执行最小规范内插器的独一和宽度( 离子数 ) 。 另一方面, 忽略偏差的规范允许非扭曲的解决方案 。 将常规化中的偏差条件( 明示或隐含地), 导致稀薄的估测值。 这种偏度可能会在实验所观察到的神经网络的良好概括化中产生部分的重要性 。</s>