Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018]. Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.
翻译:然而,对于非线性神经网络中隐含的正规化(或隐含的偏差)最近是一个非常活跃的研究领域。然而,对于非线性神经网络中的隐含的正规化,特别是对于回归损失(如平方损失),人们仍然不太了解。也许令人惊讶的是,我们证明,即使是单一的RELU神经元,也不可能用模型参数的任何明确功能来将隐含的正规化与平方损失定性(尽管从积极的一面看,我们显示它大致可以定性 ) 。 对于一个隐性层网络,我们证明了一个类似的结果,在一般情况下,除了杜等人([2018年]所查明的“平衡性”财产外,无法以这种方式描述隐含的正规化特性。 我们的结果表明,可能需要一个比迄今所考虑的框架更为笼统的框架来理解非线性预测器的隐含的正规化,并提供有关这一框架应该是什么的线索。